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Introduction 


Congratulations! You're ready to discover the easiest-to-read powerful programming 
language—or maybe the most powerful, easy-to-read programming language. That's 
Python, of course. 


With Python For Dummies, you can ferret out just a little or a lot. And with Python, you 
can write a little program that picks a random quote from a file, or you can write a set of 
programs that runs a complex business. 


This book is for you whether you're a student, you're a hobbyist, you need to understand 
more about what your programmer co-workers are talking about, or you're taking the first 
steps on a new career path. 


Python For Dummies gives you everything you need to get to an advanced-beginner level 
of Python programming. And it points you to other resources so you can take your Python 
programming skills even further. 


About This Book 


Python For Dummies is a reference book, which means 
you can read it in any order, and you don't have to read 
every chapter or section. However, to some extent, later 
chapters about more complex Python features rely on 
information introduced in earlier chapters. So if you 
don't understand something you see in a later chapter, 
go to Chapter 3, or go to the chapter on that feature to 
find out more. You can also look in the index to find a 
term or feature you want to know more about. 


Conventions Used in This Book 


This book contains Python code examples. All code examples are in monospaced font 
so they are easy to recognize. Anything that you need to type is also indicated in 
monospaced font so you know exactly which commas should be typed and which 
commas are part of the surrounding sentence. 


Python interactive mode examples include this prompt: >>>. If you don't see the prompt, 
you can assume the code was written in a text editor. 


Foolish Assumptions 


We make the following assumptions about readers of this book: 


You know how to use your computer and its operating system. 


It's helpful but not necessary to know how to set environment variables on your 
computer. It's also helpful to have a Web browser with access to the Internet. 


You have and know how to use a text editor that can produce plain ASCII 
text or files that end with the .txt extension. 


If you don't have a text editor that can do this, we include instructions for setting 
up Python's IDLE programming environment to work with the examples in this 
book. 


You have had a minimal amount of exposure to programming. 


We really do mean minimal. If you had a programming class in high school, or 
wrote a few BASIC programs at one time, or even if you have used HTML tags, 
that counts. 


If you have absolutely no experience with programming, you can still find out 
plenty from this book, but we recommend that you also look at a book or Web 
tutorial designed to introduce programming to beginners. You'll benefit from the 
extended explanations of some concepts that we don't have the space to discuss in 
detail here. 


You might have done some programming in another language. 
Programming knowledge is not required for this book, but people who have 


programmed in other languages have their own sets of issues when transitioning 
to Python, and we provide some material for such people. 


e You know little to nothing about Python. 


If you know Python, this book will still be helpful as a reference or a source of 
tips and tricks you may not be aware of. 


How This Book Is Organized 


This book gives you an overview of Python; the lowdown about all of its major parts, 
structures, and libraries; and a glimpse into some more advanced features. You also find 
out where to go to discover more. 


Part I: Getting Started 


In this part, we introduce Python and situate it among the myriad other programming 
languages available. Python is good for some things and not for others; you find out 
which is which. We provide a hands-on introduction to some of Python's abilities, using 
its helpful interactive mode and its IDLE programming environment. We briefly describe 
each of Python's basic building blocks and show how all these blocks come together by 
dissecting a working program. We sketch an overview of how professional programmers 
design programs and debug code and show you how to put these practices to work to 
make your own programming life easier. 


Part Il: Building Blocks 


Python has six basic data types and many ways to work with each of them. In this part, 
we describe how to work with strings (chunks of text), numbers, lists and tuples (both of 
which store multiple data elements), dictionaries (which associate one element with 
another), and sets (which always contain unique elements, never duplicates). 


Part Ill: Structures 


Python code usually comes in chunks, both small and big, and each chunk does a 
particular thing. This part also includes a brief introduction to some advanced features 
and the new features of Python 2.5. 


Part IV: Libraries 


Python comes with everything you need to write a very powerful program, and other 
people have already solved lots of programming conundrums for you. Its libraries include 
primary services such as communication with the operating system, text processing tools, 
various ways of reading and writing information to disk, and Internet access methods. 


Part V: The Part of Tens 


All For Dummies books include The Part of Tens. In this part, we give you ten useful but 
not-so-obvious programming idioms and ten resources where you can find out more 
about Python. 


Part VI: Appendixes 


Here you find instructions on how to install Python and its documentation, as well as a 
list of new features introduced with each new version of Python since 2.0. 


Icons appear throughout the book to indicate special material. Here's what they mean: 


Tip A Tip explains how to do something a little bit more easily and efficiently. 

Warning A Warning gives you a heads-up about tricky stuff or common mistakes that 
might cause data loss or some other sort of headache. It's best to read Warnings 
to make sure a tricky feature doesn't "getcha." 

TECHNICAL A Technical Stuff icon flags text that's of interest to readers who like 

STUFF to know about the inner workings or history of a subject. You don't 

need to read Technical Stuff material. After you've internalized a 
little about a subject, reading this text might help you understand it 
from a different angle. 

REMEMBER Remember icons highlight important concepts or pieces of information to 

keep in mind. 


Where to Go from Here 


If you want an overview of Python's history and what it can do, go to Chapter 1. If you're 
new to Python and want to start working with it right away, go to Chapter 2. If you want 
a brief overview of all of Python's building blocks, go to Chapter 3. If you know some 
Python and you want a refresher or additional info on some of its tools, go to the specific 
chapters you're interested in. 


Part I: Getting Started 
Chapter List 


Chapter 1: Introducing Python 

Chapter 2: Getting your Hands on the Keyboard—Using Help, Interactive Mode, and 
IDLE 

Chapter 3: Basic Elements and Syntax 

Chapter 4: Grand Tour of the Python Language 

Chapter 5: Working Like a Programmer 


The 5" Wave By Rich Tennant 


"The engineers lived on Jolt and cheese sticks putting this products together, but if you 
wanted to just ‘cola and cheese sticks’ in the Users Documentation, that's okay too. We're 
pretty loose around here." 


In this part... 
You get an overview of the Python programming language, an introduction to its 


interactive and developer environment, and a walkthrough of the building blocks that 
make up Python programs. 


Chapter 1 describes the history of Python and all the exciting things it's being used for 
today. You find out why computers are both the fastest and dumbest things around. Best 
of all, you discover why it's called Python anyway. 


Chapter 2 lets you talk to Python via its interactive mode and IDLE environment. You 
write a few basic programs and find out how to get Python to carry out commands for 
you, how to get Python to tell you things, and how to import tools that let you do even 
more. 


Chapter 3 introduces you to Python's data types and code blocks, the chunks you use to 
build programs. 


Chapter 4 shows you a working program. You see how all the chunks of a Python 
program talk to each other, and you find out something about the design philosophies 
behind Python programs. 


Chapter 5 lets you try on a programmer's hat to understand how programmers work and 
why they make the design decisions they do. (Unfortunately, it doesn't explain the 
relevance of caffeinated sodas to this process—you'll have to figure that out for yourself.) 
There's also a very useful section on strategies for debugging programs, which is a huge 
part of every programmer's job. 


Chapter 1: Introducing Python 


Welcome to Python! If you're the type of person who wants to know what you're getting 
into, this chapter is for you. We give you a quick history of Python and its community of 
developers. You find out what Python is and isn't good for (the "is" section is much 
longer than the "isn't" section) and the most important principles of good Python 
programming. If you're new to programming, you'll see how it's very similar to a task 
you're probably familiar with. 


The Right Tool for the Job 


Python is a general-purpose, high-level language that can be extended and embedded 
(included in applications as a tool for writing macros). That makes Python a smart choice 
for many programming problems, both small and large, and not so good for a couple of 
computing tasks. 


Good uses of Python 
Python is ideal for projects that require quick development. It supports multiple 
programming philosophies, so it's good for programs that require flexibility. The many 


packages and modules already written for Python provide versatility and save you time. 


The story of Python 


Guido van Rossum created Python and is affectionately bestowed with the title 
"Benevolent Dictator For Life" by the Python community. In the late 1980s, Guido liked 
features of several programming languages, but none of them had all the features he 
wanted. Specifically, he wanted a language that had the following features: 


¢ Scripting language: A script is a program that controls other programs. Scripting 
languages are good for quick development and prototyping because they're good 
at passing messages from one component to another and at handling fiddly stuff 
like memory management so that the programmer doesn't have to. Python has 
grown beyond scripting languages, which are used mostly for small applications. 
The Python community prefers to call Python a dynamic programming language. 

¢ Indentation for statement grouping: Python specifies that several statements are 
part of a single group by indenting them. The indented group is called a code 
block. Other languages use different syntax or punctuation for statement 
grouping. For example, the C programming language uses { to begin an 
instruction and } to end it. Indentation is considered good practice in other 
languages also, but Python was one of the first to enforce indentation. Indentation 
makes code easier to read, and code blocks set off with indentation have fewer 
begin/end words and punctuation to accidentally leave out (which means fewer 
bugs). 

¢ High-level data types: Computers store everything in 1s and Os, but humans need 
to work with data in more complex forms, such as text. A language that supports 
such complex data is said to have high-level data types. A high-level data type is 
easy to manipulate. For example, Python strings can be searched, sliced, joined, 
split, set to upper- or lowercase, or have white space removed. High-level data 
types in Python, such as lists and dicts (which can store other data types), 
encompass much more functionality than in other languages. 

e Extensibility: An extensible programming language can be added to. These 
languages are very powerful because additions make them suitable for multiple 
applications and operating systems. Extensions can add data types or concepts, 
modules, and plug-ins. Python is extensible in several ways. A core group of 
programmers works on modifying and improving the language, while hundreds of 
other programmers write modules for specific purposes. 

¢ Interpreted: Interpreted languages run directly from source code that humans 
generate (whereas programs written in compiled languages, like C++, must be 
translated to machine code before they can run). Interpreted languages run more 
slowly because the translation takes place on the fly, but development and 
debugging is faster because you don't have to wait for the compiler. Interpreted 
languages are easier to run on multiple operating systems. In the case of Python, 
it's easy to write code that works on multiple operating systems—with no need to 
make modifications. 


People argue over whether Python is an interpreted or compiled language. 
Although Python works like an interpreted language in many ways, its code is 
compiled before execution (like Java), and many of its capabilities run at full 
machine speed because they're written in C—leaving you free to focus on making 


your application work. Guido began writing Python during his Christmas vacation 
in 1989, and over the next year, he added to the program based on feedback from 
colleagues. He released it to the public in February 1991 by posting to the Usenet 
system of newsgroups. In Guido's words: "The rest is in the Misc/HISTORY file." 


Fast development 


High-level features make Python a wise alternative for prototyping and fast development 
of complex applications: 


Python is interpreted, so writing working programs and fixing mistakes in 

programs is fast. 

TECHNICAL Programs written in interpreted languages can be tested as 

STUFF soon as they're written, without waiting for the code to 
compile. 

Python takes care of such fiddly details as memory management behind the 

scenes. 

Python has debugging features built in. 


REMEMBER All these features make Python a good language for 


e Off-the-cuff, quick programming 
e Prototyping (sketching the design basics of complex programs, or 
testing particular solutions) 


e Applications that change, build on themselves, and add new 
features frequently 


Programming styles 


Python is a multi-paradigm language (meaning it supports more than one style or 
philosophy of programming). This makes it good for applications that benefit from a 
flexible approach to programming. Python includes tools for the following paradigms: 


Object-oriented programming (OOP for short) is one of the popular programming 
styles that Python supports. OOP breaks up code into individual units that pass 
messages back and forth. 
Tip Object-oriented programming is good for applications that have multiple 
parts that need to communicate with each other. 
Python has features in common with the following languages. If you know these 
languages, you'll find features in Python that you are familiar with, making 
Python easier to learn: 
o Java: An object-oriented language especially for applications used over 
networks 


o Perl: A procedural language used for text manipulation, system 
administration, Web development, and network programming 
Tcl: Used for rapid prototyping, scripting, GUIs, and testing 
Scheme: A functional programming language (a language that focuses on 
performing actions and calculations by using functions. 


For more about functions, see Chapter 11, and for an intro to functional 
programming, see Chapter 16.) 


REMEMBER Python For Dummies includes a brief introduction to object-oriented 
programming (Chapter 13), an overview of using Python for Web 
development (Chapter 20), and tips for scripting and testing. 


Versatility 
Python modules (collections of features for performing tasks) let Python work with 


e Multiple operating systems and user interfaces 
Tip With Python For Dummies, you can write and run programs on Windows, 
Mac, and Unix (including Linux). Python programmers have also written 
code for other operating systems, from cell phones to supercomputers. 
e Special kinds of data (such as images and sound) 


Python comes with dozens of built-in modules. New modules can be written in either 
Python or C/C++. 


Companies that use Python 


The main portal to Python and the Python community is http://www.python.org. This 
portal contains a page that lists some companies that use Python, including 


e Yahoo! (for Yahoo! Maps) 

¢ Google (for its spider and search engine) 

e Linux Weekly News (published by using a Web application written in Python) 

e Industrial Light & Magic (used in the production of special effects for such 
movies as The Phantom Menace and The Mummy Returns). 


Other commercial uses include financial applications, educational software, games, and 
business software. 


Convenience 


Most programming languages offer convenience features, but none boast the combination 
of convenience and power that Python offers: 
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Python can be embedded in other applications and used for creating macros. 
For example, Python is embedded in Paint Shop Pro 8 and later versions as a 
scripting language. 

Python is free for anyone to use and distribute (commercially or 
noncommercially), so any individual or company can use it without paying 
license fees. 

Python has powerful text manipulation and search features for applications 
that process a lot of text information. 

You can build large applications with Python, even though it doesn't check 
programs before they run. In technical terms, Python doesn't have compile-time 
checking. Python supports large programs by connecting multiple modules 
together and bundling them into packages. Each module can be built and tested 
separately. 

Python includes support for testing and error-checking both of individual 
modules and of whole programs. 


Sometimes, Python isn't so hot 


Python by itself isn't best for applications that need to interface closely with the 
computer's hardware because 


Python is an interpreted language. 
Interpreted languages are slower than compiled languages. 


Python is a high-level language that uses many layers to communicate with the 
computer's hardware and operating system. 


REMEMBER Python might not be the best choice for building the following types of 


applications and systems: 
Graphics-intensive applications, such as action games 


But some games use Python because specialized modules can be written to 
interface with hardware. The pygame module is one such package. (Modern 
computers are extremely fast, which means it's more important to be able to write 
clean code quickly than to get maximum speed out of the software, except for the 
most graphics-intensive games.) 


The foundations of an operating system 


The Python developer community 


Python has attracted many users who collectively make up a community that 


Promotes Python 
Discusses and implements improvements to the language 
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e Supports newcomers 
e Encourages standards and conventions that improve Python's usability and 
readability 
e Values simplicity and fun (after all, Python was named after Monty Python, the 
British comedy troupe) 
The Python community has created words to describe its philosophy: 
Pythonic identifies code that meets the following criteria: 
e It includes interfaces or features that work well with Python. 
e It makes good use of Python idioms (standard ways of performing tasks) and 


shows understanding of the language. 


Unpythonic code is roughly translated from other languages instead of following Python's 
philosophy. 


Pythonistas are knowledgeable users of Python (especially users who promote the 
language). 


Cooking Up Programs 
Writing programs is a little bit like working with recipes. For example, you can 
e Write a recipe to make bread from scratch. 


In Python, you can build a program from scratch, writing all your own code and 
using only Python's basic built-in functions. 


e Use the product of one recipe in another recipe (for example, a recipe for 
turkey stuffing uses bread as an ingredient). 


After you write program that performs a basic task, you can insert it into other 
programs the same way you add any ingredient to a recipe. 


e Buy premade bread. 
Python comes with many modules, which are sets of programs other people have 
written that you can plug into your program, just like you can buy bread at the 


store without baking it yourself. 


Python's even better than bread because most Python modules are free! 
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When you write a program, you are telling the computer to do something. Python For 
Dummies gives you step-by-step instructions that help you understand how to write the 
way a computer "thinks." 


REMEMBER Unlike you, computers are pretty stupid. They can do only a few things. 
All the actions that humans make them do are the result of the computer's 
doing those few things over and over, in different combinations, very 
quickly. 


Training your assistant 


Imagine that you're a baker, and you have taken on an apprentice baker who is as stupid 
as a computer. If you want to show your baker how to make bread from scratch, you need 
to start with very basic steps. You've already started by putting warm water and sugar in a 
small bowl. Then you and the apprentice have this conversation: 


e You: "Add a package of yeast." 

e¢ Apprentice: "I can't find a package of yeast." 

e You: "The refrigerator is over there. Inside the refrigerator is a little package 
labeled Yeast. Go get it." 

e The apprentice gets the package and says, "Now what?" 

e You: "Put the package in the bowl." 

e The apprentice puts the package in the bowl. 

e You: "Hey! Open the package first!" 


By now you might doubt the wisdom of hiring an apprentice baker who needs to be told 
things that seem completely obvious to you. But if you persevere, you'll come out ahead. 
If this apprentice is like a computer, then after finally figuring out how to bake bread in 
your kitchen, your new baker will be able to prepare 100 loaves a minute! 


Combining ingredients 


When your apprentice baker knows all the procedures involved in baking bread, such as 
finding the ingredients on the shelves, finding the pots and pans, mixing ingredients, and 
operating the oven, you can assign other tasks that use those same procedures. Baking 
bread involves combining ingredients in a bowl, so if you need to combine ingredients 
for another recipe, the apprentice already knows how to do that. So when you want to 
explain how to make cookies, you can now say "combine sugar, flour, and butter in a 
bowl" without explaining where to find the bowls or the sugar. 


REMEMBER In Python, after you've written a program to do something, you can import 


it into another program. So the more you work with Python, the faster 
you'll be able to write programs. 
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Chapter 2: Getting your Hands on the 
Keyboard—Using Help, Interactive 
Mode, and IDLE 


Overview 


Even if you haven't used Python or another programming language before, it's easy to get 
up and running with Python. You don't even have to know how to write a complete 
program because you can run Python in interactive mode. In interactive mode, you can 
tell Python what to do one instruction at a time, or you can write small portions of code to 
see how Python handles them. In this way you can learn by doing, trying things out at 
your own pace. 


If you've worked with other programming languages, you're probably eager to get into 
the workings of Python and see how it compares. This chapter introduces you to some of 
the tools you'll use as you develop Python programs, as well as some of Python's basic 
syntax. 


Ready for a full-on development experience? Or just curious what a debugger is? Then 
go on to the "IDLE Musings" section about Python's very own development environment, 
IDLE (Integrated DeveLopment Environment). This comprehensive set of tools supports 
you when you are writing, testing, and finding or fixing mistakes in programs. 


Tip In most of this book, you read and experiment on your own; it's structured so that 
you can pick up information without reading sequentially. However, if you're new to 
programming, you might find it useful to read all of this chapter and try some 
examples before going on to the rest of the book. In the following sections, you get a 
good "hands-on" foundation with Python's interpreter, which will make you more 
comfortable when you move on to writing your own programs. 

Tip If you want to get an overview of Python's features, jump ahead to Chapter 3. We'll 
be waiting for you here when you want to find out more about interactive mode or 
the IDLE editor/debugger. 


If you need to install Python, Appendix A has the instructions. 
Two Ways to Interact with Python 


One of the reasons Python is easy to use is that it comes with tools that help you design, 
write, and debug your programs. 


This chapter describes two of these tools: 
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e Interactive mode: In interactive mode, you type instructions to Python one line 
at a time—much the same way that an operating system (shell) accepts 
instructions from a command line. You can also write short multiline programs or 
import code from text files or from Python's builtin modules. Interactive mode 
includes extensive help, too. With interactive mode, you can explore Python's 
abilities. 

e IDLE: The IDLE development environment includes Python's interactive mode 
and more—tools for writing and running programs and for keeping track of 
names. 


IDLE is written in Python and shows off Python's considerable abilities. 


Going One-on-One in Interactive Mode 


You can do most anything in interactive mode that you can do in a Python program— 
even write multiline programs. Think of interactive mode as 


e Asandbox for experimenting safely with Python 
e =A tutor 
e A tool to find and fix problems (bugs) in your programs 


Warning You can't save what you type in interactive mode. If you want to keep a copy of 
what you wrote, save your code and results in a file on your computer. 


You can use interactive mode as a calculator. You can manipulate text and make 
assignments in interactive mode. Finally, you can import modules, functions, or parts of a 
longer program and test them. These features can help you 


e Experiment with Python objects without writing long programs. 
e« Debug programs by importing parts of your program one at a time. 


Starting interactive mode 
To start Python's interactive mode, follow these steps: 


1. Open a command window. 
o If you're using Mac OS X, open the Terminal application and select File 
C.>New Shell. 
o If you're using Windows, open the Command Prompt window. 
o If you're using UNIX, either open a new shell window or just type in your 
existing shell window. 
2. Type python. 


When Python opens, you see the text shown in Figure 2-1. 
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When Python's interactive mode starts up, it tells you what version is running, the date 
the version was released, and a few hints about what to do next. Then it displays the 
Python prompt: >>> 


Python 2.5bl (e25b1+47038K, Jun 20 2006, 16117455) 
[OCC 4.0.2 (Apple Computer, Sec. build $341)) om darwiz 
Type “help, ‘copyright’, “oredite* or “license* foc nore informetion, 


Figure 2-1: Python's interactive mode in a Terminal window. 
Why computers are always saying "Hello, World!" 


"Hello, World" programs are a computer programming tradition. According to the free 
Internet encyclopedia Wikipedia (http://www.wikipedia.org), the first instance of a 
computer program that printed "Hello, World" occurred in 1973, in a book called A 
Tutorial Introduction to the Language B, by Brian Kernighan. Since then, a "Hello, 
World!" program has been written for almost every computer language. Wikipedia lists 
more than 170 "Hello, World!" programs written in languages from 4GL and ActionScript 
to UNIX shell and XUL. 


One reason that "Hello, World" programs are popular is that a program that prints a single 
statement is usually the shortest working program in a language. 


e In Python, the shortest working program is one line long. 
e In Java, the program is five lines long. 


Aren't you glad you're using Python? 


Following the rules of engagement 
The basic method for working with interactive mode is simply this: 


1. Type a statement or expression. 
2. Press the Return or Enter key. 


When you press Return, Python interprets your input and responds if what you typed 
calls for a response or if the interpreter doesn't understand what you typed. 


In the following example, the statement tells Python to print a string. Because the 
statement doesn't specify where to print the string, Python prints it to the screen (the 
default behavior in interactive mode). 


SSS jowiliac “Eelil@, Worllell™ 
Hello, World! 
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This statement is a whole Python program! Pretty simple, eh? When you use interactive 
mode, Python processes each line of code you type as soon as you press Return (unless it 
sees that you are writing a multiline chunk of code), and the results appear underneath. 
Seeing information about a Python object 


In interactive mode, there are two ways to see information about an object: 


e Type the object (or its name) and press Return. 
e Type the print command and the object (or its name) and press Return. 


What you see depends on what the object is. 


e With some data types (integers and lists, for example), the two methods of seeing 
the value give the same result—as in this example, in which the name stores a list: 


Oo S55 se = [372] 
OQ SSS ke 

Oo [sy 2] 

QO SSS jouealialic ox 
OQ [3p 2] 


e With strings, the result of typing print name and pressing Return is slightly 
different from the result you get by typing name and pressing Return. When you 
just type name and press Return, the value is surrounded by quotation marks, but 
when you type print name and press Return, there are no quotation marks. (To 
find out why, see the sidebar, "Representing data".) 


This example shows the difference between using just the name and using the 
print statement with the name: 


SSS ox = Vinyysicieaiave4 
Poo OX 

ThaiyAsiie re aiavej 

SSS joieiine sx 
mystring 


e When the name refers to a code block (for example, a function, module, or class 
instance), looking at the name shows you information such as the kind of data, the 
name, and the storage location. 


This example creates a class called Message and displays information about the 
class: 


>>> class Message: 
pass 
>>> Message 
<class main .Message at 0x58db0> 
>>> print Message 
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main .Message 


Representing data 


Why do you sometimes see different results when you type name and when you type 
print name? Just typing name and pressing Return is a shortcut for using the function 
repr () to display the result, whereas the print command uses the function str() to 
display the result. 


In technical terms, when you type an object name, literal, expression, or statement and 
press Return, Python evaluates the object. That is, Python runs the code and 
returns/displays the result. 


According to Python's built-in documentation, the function str() returns a "nice" string 
representation of an object. The function repr () returns the "canonical" string 
representation of the object. Here's the difference between a "nice" and a "canonical" 
representation of a floating point number: 


SSS 3.2 canonical 
3.2000000000000002 

SSS siti (3.2) iMaaLe 

US ae 

SSS ie (3.2) canonical 
'"3.2000000000000002' 

SSS jeime 352 MLS 

Boe 


The canonical representation usually tries to be a chunk of text that, when pasted into the 
interpreter, re-creates the object. This example shows how: 


>>> mytuple = (3, 4) 

SS> mylliisi = |[l, 2", myrrwole| 
>>> print repr (mylist) 

[lp "20, (3p 4) 1 

SSS imylist == fi, '2", (3S, 4] 
Tews 


(Note that some objects, such as files, can't be re-created by repr (). You can still use the 
output of repr () as debugging info when working with such objects.) 


Here's an example of what str() and repr () return when you give them a class as an 
argument: 


>>> class Message: 
pass 


>>> str (Message) 

"tian Message" 

>>> repr (Message) 

"‘<Glass wedi Message ee Oxatied>! 
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Seeing the result of the last expression 


When you type an expression by itself in interactive mode, or when Python returns an 
expression as a result of something you typed, Python also stores the value of the 
expression in a special name: _ (an underscore character). This name is available only in 
interactive mode. To see the value stored, type _. 


Se Miskeulile, Wierellel 
"Hello, World!" 


ee 

NEILL, Whesellel!! © 

TECHNICAL Note that doesn't store the results of any statements (assignments 

STUFF such as x=25 or commands such as print). In the following example, 
_ continues to hold the value of the expression even after a statement 
was typed: 


>>> "Hello, Nurse!" 
Maelo, Nites 
>>> x = 35 
P>> 
"Hello, Nurse!'! 
Warning Don't rely on _ in long segments of code. The value stored in _ may change 
unexpectedly if you aren't paying close attention to the difference between 
statements and expressions. 


Manipulating strings and lists 


You can use Python's interactive mode to see a few of the interesting tricks Python can do 
with string and list data. (We cover strings and lists in Chapters 6 and 8.) 


Of printing, commas, and space 


When you want to print several strings, or a string and the value of a name, you can use a 
comma to stand for a single space in the printed output. The following example shows the 
comma in action: 


>>> y = "The meaning of Life, the Universe, and Everything is" 
>>> x = 42 

PSS joan Wy xX 

The meaning of Life, the Universe, and Everything is 42 


Measuring and splitting strings 


The function len () returns the length of a string, as illustrated here: 


>>> x = "supercalifragilisticexpialidocious" 
>>> len (x) 
34 


REMEMBER 1en() works with other sequence data types, too—for example, if you 
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give it a list as an argument, it returns the number of items in the list. 


The method split () breaks a string into separate words and returns a list 
of the words, like this: 


SSS x = Milas 2S aim exqoeicioie |)” 
SSS x<, Sill ae () 
[i tanres!, "as", Vem, Vesxqjoeirieoie! !] 
TECHNICAL The split () method actually breaks a string wherever it finds white 
STUFF space, so sometimes it doesn't break the string where you expect— 
for example: 


>>> 'one and/or two' 
one Vand //Orl yeu ewon| 


Using interactive mode as a calculator 


The Python interpreter can be used like a calculator. If you just want to do simple 
calculations, you can type numbers and operators and press Return to see the result, as in 
the following example: 


SSS CL + 3) = (2 4 2) 

16 

SSS i a 8 2a 2 

9 

Warning Don't use an equals sign (=) when doing calculations like these. In Python, an 
equals sign gives a name to a value. If you use = to try to get the result of a 


calculation, Python gives you an error message: 


SS al a gr 2a 2 = 
Walle “<ereclaiias, Ihatioey AL 
ioe oe Ba 2 = 


“A 


SyntaxError: invalid syntax 


You can also use names to do math in the Python interpreter. This is easier when doing 
calculations with several steps, like the following: 


Sas xx = i a 3 
SSS y= 2a 2 
SSS x 
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Warning If you type all whole numbers (integers) when you're doing arithmetic, Python 
returns the result in integers. If you want precise results in calculations 
involving division, be sure that at least one of the numbers in a calculation is a 
decimal number, or type the statement from future import division 
before doing your calculations. Doing the latter imports the true division feature 
from a special module called — future _, which contains improvements that 
will be automatically activated in later versions of Python. To find out more 
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about using Python to do division, see Chapter 7. 


These examples show how to get correct division results: 


sss 13/3 # That can't be right 
4 

SSS 13.,0/3 7 Minaie’s imoree Ilake sie, 
A SIIFISIISIOIIIIS 

>>> from future import division 
SSS 13/3 


A, SISIIISIIIIIII3 
Working with built-in functions 


In addition to the standard arithmetic operators such as + and *, you have access to a 
number of math functions and other functions built into the Python interpreter, and more 
such functions are available in the math module. 


REMEMBER Functions are pieces of code that 
e Carry out specific operations. 
e Return the results of those operations. 


To work with a function, you need to call it, and you usually need to pass it one or more 
arguments—data you want it to act on. You call a function by adding parentheses at the 
end of the function name. Any arguments you pass to the function go inside the 
parentheses. 


Getting help on a function 
To get help on built-in functions, type help with the name of the function in parentheses. 


The help usually tells you how many arguments the function takes. If an argument is in 
square brackets ([]), it's an optional argument. If you leave out an optional argument, a 
default value is used. 


The help for round(), displayed here, shows that round() takes one required argument 
and one optional argument (the number of digits to the right of the decimal point, which 
defaults to 0): 


>>> help (round) 
Help on built-in function round: 


TO WAC! (5 4.6) 
round(number[, ndigits]) -> floating point number 
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Round a number to a given precision in decimal digits (default 0 
GluLGALiES)) . 
This always returns a floating point number. Precision may be 
negative. 


The following examples call a built-in function named round (), which takes a decimal 
(or floating point) number as an argument. It rounds off the number to the nearest whole 
number and returns the result. 


>>> round (9.9) 
10.0 
SSS woul! (9,3) 
9.0 


Examining names 


When working in interactive mode, you sometimes need to be reminded what names 
you've given to objects. The dir () function, which is built into interactive mode, lists the 
names (such as names of data objects, module names, and function names) that are stored 
in the interactive mode's namespace at any particular point in your coding session. 
(Namespace is a Python term for a list of names that a particular part of a program knows 
about.) 


Tip You can also use the dir() function to examine the contents of modules. 
Examining the namespace 


The following example shows what happens when you start Python's interactive mode 
(so you have not defined anything yet), use dir() to see what is defined, and then give a 
value to a name and use dir() again: 
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% python 
Pyemom 25ioi (e2siols470OSeM, Juin 20 2006, 1s i7355) 


SSS clalie () 

[Ylowallcims ", " cee ", ° name ual 

SPS COO _iiemy CAS = VImpossalole |’ 

SS > olalie () 

[Y_ lowualilicims %, ° Coe %, ' name (7 "TOO itemy Cats! | 


After you give a value the name too many cats, the namespace remembers that name 
and gives you the value if you ask for it, like this: 


>>> too many cats 
"Impossible' 
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Examining a module 


The builtins _ module defines some exceptions (error-handling code), functions, 
and constants. To see what's in the builtins module, simply type 
dir( builtins ). 


There are a lot of things in that builtins module! The following example shows the 
contents of the credits constant for Python 2.5. (A constant, like a name, stores data, but 
the data inside a constant shouldn't be changed.) 


>>> credits 

Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of 
thousands 

for supporting Python development. See www.python.org for more 
information. 


The capitalized names in the __ builtins _ module are exceptions, messages that Python 
sends when it encounters errors and other unusual conditions. These are all part of a 
module called exceptions, which you can see if you type one of the names into 
interactive mode: 


>>> ArithmeticError 
<type 'exceptions.ArithmeticError'] 


Writing multiline programs in interactive mode 


You can write multiline programs or parts of programs in interactive mode. However, 
note that interactive mode doesn't let you save your work, so you might also want to save 
the code in a text file. 


The following example program prints some kilometer/mile conversions. Here's how it 
looks when typed into a text editor. (If you don't understand everything the code does, 
don't worry. String formatting is explained in Chapter 6.) 


for miles in range(10, 70, 10): 
km = miles * 1.609 
Deine sd mMelese => oSo2 bk lometersiuec (ma les; sekm)) 


Note that some lines are indented. Indentation is very important in Python because it's 
how Python tells when you are using a code block (several lines of code that are grouped 
together). Four spaces is standard for one level of indentation (the lines starting with km 
and print), eight spaces for two levels, and so on. 


Here's how the same program looks when you start to type it in interactive mode. When 
you press Return after the first line, the interpreter recognizes that you're writing a code 
block and displays a continuation prompt: ... (three dots). Even though you see a 
continuation prompt, you still need to indent the lines that are part of the code block, so 
don't forget to add four spaces before starting the km assignment and the print statement. 
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SSS rei mlles iim ramge(lO, WO, iO) s 
km = miles * 1.609 
print "Sd miles --> %3.2f kilometers" % (miles, km) 


Tip Okay, we admit it: We sometimes make our thumbs happier by adding only two 
spaces when we're working in interactive mode. For actual programs that others 
might see, though, it's a Good Idea to use four spaces. 


When you are finished with the code block, you press Return at the continuation prompt 
without typing anything. The interpreter runs the code, which gives this result: 


10 miles --> 16.09 kilometers 
20 miles --> 32.18 kilometers 
30 miles --> 48.27 kilometers 
40 miles --> 64.36 kilometers 
50 miles --> 80.45 kilometers 
60 miles --> 96.54 kilometers 


When you run a code block in interactive mode, the names you've defined in the 
namespace retain the values they were given last. 


pes joreslioic lata) 
96.54 

>>> print miles 
60 


Quitting interactive mode 


To exit Python's interactive mode, press Control-D (on Mac or UNIX) or Ctrl-Z (on 
Windows). 


If you type quit, Python reminds you what to do, like this: 


SSS ejullic 

(WSS Ciciel—D) (i.65 WON) wo eoalic. | 

Tip If you have Python 2.5, you can still use Control-D or Control-Z (depending on your 
operating system) to quit, or you can type quit () or exit () (both work on all 
operating systems). The reminder if you type quit without the parentheses is a little 
different (this is the reminder on UNIX): 


SSS Cillitic 
Use guit() or Ctrl-D (i.e. EOF) to exit 


When you exit interactive mode, you return to the command prompt of your Terminal 
window or Command Prompt window. 


Warning When you quit the Python interpreter, all the values you've given names 


disappear. If you want to keep your work, copy any code you want to save into 
a text file. 
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Getting Help 


Beyond Python For Dummies, many other sources of Python help are available to you. 
One of the most complete sources is Python's built-in help system. 


Tip The help built into Python assumes that you have some knowledge of programming 
and about Python. If you're new to programming, some of the help topics might look 
pretty daunting. But after you get a handle on some of the terminology, it gets easier! 


Help in interactive mode 


You can access Python's help system from interactive mode. There are two ways to use 
help. 


Entering the help program 

You can run the help system as a separate program inside the Python interactive mode 
environment. The help system has its own prompt. To go into help mode, follow these 
steps: 


1. Type help() at the Python prompt. 


The help program opens, and you see a welcome message and some suggested 
topics. Then it displays the help prompt: 


help> 
2. Type the name of an item you want to know about. 


For example, you can get help about the list data type and its methods: 


help> list 
3. To quit help mode, type quit at the help mode prompt. 


When you quit help, you go back to Python's interactive mode, and you see the 
Python prompt again: 


Pitan: 
Using help without leaving interactive mode 


You can also use help to get a tip about a particular item without leaving interactive 
mode. To get help about a particular item, follow these steps: 
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1. At the Python prompt, type help followed by the name of the item you want 
to find out about in parentheses. 


For example, to get help about the 1ist data type, type 


>>> help (list) 


If the information is less than one screenful, you go back to the interactive mode 
prompt automatically. 


If there's more information, follow Step 2. 
2. If the information is longer than one screen, press the spacebar to see more. 
When you finishe, type q to go back to the interactive mode prompt. 


Tip Some installations of Python—for example, the installation on Mac OS X 10.3 
(Panther) or 10.4 (Tiger)—don't come with all the Python help installed. If you see a 
message in the help program that documentation cannot be found, you can install the 
documentation (follow steps 1—3 in the next section, "Getting help in a Web 
browser") and tell Python where to look for it. One way to do this, if you're using 
Mac OS X or UNIX, is to set the PYTHONDOCS environment variable to the folder 
path containing the documentation. The way you do this depends on which UNIX 
shell (operating system) you are using. Here's how you do it in Mac OS X's Terminal 
window, using the default bash shell (replace the path with the correct path for your 
docs): 


export PYTHONDOCS=/Library/PythonDocs 
Getting help in a Web browser 
If you want to keep Python documentation open in a separate window while you work, 
you're in luck. The Python documentation is in HTML format, so you can keep it open in 
a Web browser window. 


Follow these steps to keep the Python documentation readily available for your use: 


1. Open your Web browser. 
2. Type http://www.python.org/doc/ in your browser's address box and press 
Return. 


This Web site archives documentation for every version of Python that has been 
released. The documentation for the most recent version is available at 


http://www.python.org/doc/current/. 


3. Follow the online instructions to download the correct version of the help 
files. 
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4. Make a bookmark in your Web browser to the location of the downloaded 
help files. 


Doing this sets up the Python documentation so you can access it quickly. 


5. To open the help files in your Web browser while you're working with 
Python, just select the help file bookmark from your Web browser's 
bookmark list. 


Using Scripts and Modules 


Because Python's interactive mode doesn't save any of your work when you quit, you'll 
want to store your important work in text files. Text files that contain Python code are 
called scripts (if they are whole programs) or modules (if they contain chunks of code 
meant to be imported into other programs). 


Both scripts and modules end with the suffix .py. 


TECHNICAL Actually, the distinction between scripts and modules is fuzzier than 

STUFF that because some modules include code that lets them stand on their 
own or that gives information about their status... and some 
programs can be imported as modules. 


Running a script from the command line 


When you run a Python script from the command line of your shell or Terminal program, 
it behaves like other programs you run from the command line—when it's finished 
running, you get another command prompt. To run a script from the command line 
(indicated by the command prompt %), type python and the name of the script, like this: 


% python script.py 


If you run a script from the command line, you use the -i modifier to tell Python to go 
into interactive mode after you run it. To create a script and run it interactively on a 
command line, follow these steps: 


1. Use a text editor to write your script. 


For example, you might type the following in a text editor: 


print "testing how scripts and interactive mode communicate" 
x = 500 
print "The value of x is ", x 


2. Save your script and give it a name. 
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REMEMBER Use the . py suffix for the name. 
Our script is called tinyscript.py 


3. Torun the script from the command line, type python -i followed by the 
name of the script (including the . py suffix). 


When you run the script, Python executes its code and then enters interactive 
mode (you can tell because you see the >>> prompt). 

 jOWelneGin i LIM ySeIesSoe jw 

testing how scripts and interactive mode communicate 
The value of x is 500 

>>> 


The names that are part of tinyscript.py are now available in interactive mode. 
To test this, type dir() and print the value of any of your script's names. 


SSS Clalie (() 

[Y lotililicams Y, doc Verna file ee! name Vien *Vibse ite] 
SSS JOIeGLIMIE xx 

500 


Importing a module in interactive mode 


Modules are text files containing Python code that's designed to be used inside another 
program. To work with a module, you need to import it. When you import the module, 
Python runs the module's code and stores the module's name in the namespace. 


There are two ways to import modules: You can import the module name or you can 
import one or more of the names defined inside the module. 


The following examples import the module tinymodule.py, which includes the 
following function: 


def tinyfunction (x): 
print "testing how modules and interactive mode communicate" 
print "You passed me the parameter", x 


Yh OT, 
foreskin, OX, Wisvefbieteevol als", % 
return Z 


Importing by name 


Importing a module by name stores the module name in the namespace. But it does not 
directly import any names that are defined in the module. 
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To import a module by name, type import followed by the name of the module without 
the .py suffix, like this: 


>>> import tinymodule 


To make sure that the module's name has been imported, type dir(). 


SS clalie (()} 
[Y joviliiedons 7, : doe Mea name 7, \ iSalinwyanexeluilS \ || 


To use an item inside a module you've imported by name, you need to tell Python both 
the module name and the name of the item. (The items inside modules are collectively 
called attributes of the module.) This is similar to handing someone a book and telling 
them to find a chapter inside it. 


To access an attribute of a module, type the module name, a dot, and the attribute name. 


In this example, we call the tinyfunction() attribute in the tinymodule module, and 
we give a name to its result. This causes Python to run the code in the function. Here's the 
result: 


>>> x = tinymodule.tinyfunction (2) 

testing how modules and interactive mode communicate 
you passed me the parameter 2 

2 squared is 4 


The tinyfunction() function also returns a value. To see it, we print the name we gave 
to the function: 


SS joueilimic sx 
4 


Importing items from inside a module 


Importing items from inside a module stores their names in the namespace, which gives 
you direct access to the items—you don't have to type the module name to use them. But 
this method doesn't store the name of the module in the namespace. 


To import the function that's inside tinymodule, type this: 


>>> from tinymodule import tinyfunction 


To check what's stored in the namespace, type dir (). Note: If you didn't quit Python 
after doing the example in the previous section, "Importing by name," you might see 
other names as well when you type dir(). 


SSS iclalic (()) 
[Y lowatilicims 9, y doc ee name Vo Vealiny ie Winere som ! || 
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Tip Importing a module by name is the recommended way of getting access to a module 
and its functions. Although importing items from inside a module is useful in some 
situations, this can cause problems if the items have the same names as other items 
you're working with. 


Using Python's standard modules in interactive mode 


Python comes with dozens of modules that add functionality when you want it but stay 
out of your way when you don't. 


Listing Python's modules 
To see the list of built-in modules, follow these steps: 


1. In Python interactive mode, type help () and press Return to start the help 
utility. 

2. At the help> prompt, type modules, like this: 

3. help> modules 

4. Please wait a moment while I gather a list of all available 
modules... 


Why modularizing is a good idea 


Why doesn't Python automatically import all of its built-in modules? First of all, 
importing all of its modules would take a LOOOONG time. Second, if it did, there would 
be thousands of names stored in the namespace, and you wouldn't be able to give those 
names to your own data—or if you did use them, you might get unexpected results. 
Because you need to import modules, stuff you don't use stays out of your way, like 
storing your winter clothes in the attic until October. 


Whoa! That's a lot of modules. But don't worry. You can drive a car without knowing 
exactly how everything under the hood works, and the same goes for working with 
Python. Python For Dummies explains the workings of many of Python's modules and 
shows you where to go to discover more about the rest. 


Tip The list of modules differs depending on your version of Python, your operating 
system, and the third-party modules that may have been installed for Python. 


Listing the contents of an imported module 


The names inside a module might include value assignments, other modules that the 
module imports, functions, classes, and so on. 


To see the names that belong to a module you've imported—cither a . py file you wrote 
yourself or one of Python's modules—type dir () with the name of the module inside the 
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parentheses. For example, to import the math module that comes with Python and list its 
functions, follow these steps: 


1. At the Python prompt, type import math. 

2. To list the names (functions and constants) that the math module defines, type 
dir (math). 
Python displays the contents of the math module. 


Getting interactive help for a module's functions 


The help information for a large module can be many pages long. If you know the 
function you want, here's how to get help for that function specifically: 


1. Make sure the module has been imported. 
2. Type help( module name . function_name ). 


Be sure to include the parentheses and the dot between the module name and the 
function name. 


This example shows the help for one of the functions in the math module: 


>>> help (math.pow) 
Help on built-in function pow in module math: 


ONT (0-0 3.) 

pow (x, y) 

Return x**y (x to the power of y). 
IDLE Musings 


IDLE stands for Interactive DeveLopment Environment. (Other computer languages have 
such environments too, but they're usually called IDEs. Python's is called IDLE after Eric 
Idle of Monty Python fame.) IDLE is an editing program written entirely in Python by 
Guido van Rossum. 


TECHNICAL IDLE is installed when you install any recent version of Python, but 

STUFF it doesn't always come with versions of Python that are preinstalled 
on your computer, such as with the Mac OS X. You can find 
instructions for installing Python in Appendix A. 


Opening IDLE 
When you open IDLE, you see a window called Python Shell, as shown in Figure 2-2. 


Depending on your version of IDLE, you might also see other windows; you can safely 
ignore them for now. 
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Figure 2-2: IDLE's Python Shell window on Mac OS. 
The Python Shell window runs in interactive mode; it displays the Python prompt, >>>. 
Typing statements and programs in the Python Shell 


Interactive mode in IDLE works much the same as interactive mode in a shell window 
running Python. Here are some differences: 


e The code you type is colorized to make it easier to distinguish one part of a 
statement from another. 

e When you write multiline statements, IDLE does not display the .. 
continuation prompt. 

e IDLE automatically indents lines for you. 


Figure 2-3 shows IDLE's Python Shell window with the miles/kilometers conversion 
program we introduce in the section, "Writing multiline programs in interactive mode," 
earlier in this chapter (colorizing is not shown—1t's a black-and-white book!). 


Figure 2-3: A program and its result in IDLE's Python Shell window on Mac OS. 
Getting more help for IDLE 

IDLE also has built-in help—took in the Help menu. 

A good "getting started" Web page for a recent version of IDLE is here: 
http://www.hkn.eecs.berkeley.edu/~dyoo/python/idle_intro 


The following pages have additional documentation that you'll find useful, but they are 
based on earlier versions: 
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e = http://www.ai.uga.edu/mc/idle/index.html (based on IDLE 1.0.2) 
e = http://www.python.org/idle/doc (based on IDLE 0.5) 


Clever Python Shell features 
Here are a couple of time-saving features of IDLE's Python Shell: 


e Putting your insertion point or cursor in a line of code and pressing Enter 
inserts a copy of the code on a new line for editing. 

e Typing the opening parenthesis for a function call or method call pops up a 
small box with a helpful tip or a list of expected arguments. 


Writing and editing code with IDLE's text editor 


IDLE includes a text editor for opening, editing, and creating modules and scripts. Here 
are some of its key features: 


¢ To open a new text editing window, choose File New Window. 

¢ To open an existing module or script for editing, choose File is Open. 

e The Format menu available when you're working in the text editing window 
contains Python-specific formatting commands such as Indent Region. 

e Torun a program or module, make sure it is open in the text editor and 
choose Run -;Run Module (on some versions, the command is File C Run 
Script). 


The Python Shell restarts (clearing any names that it was storing from the 
previous session) and Python executes the code. 


Briefly meet a few other IDLE commands 


IDLE has Find commands (in the Find menu or Edit menu) to search for text in the 
frontmost IDLE window. It also has a Find in Files command that searches all files in the 
Python search path or a subset of those files. 


With the Path Browser (available from the File menu), you can examine and open any 
Python code that is in the Python path. (The Path Browser is shown in Figure 2-4.) To 
navigate the path, click the + buttons to expand a folder. To open a . py file in the text 
editor, double-click it. To highlight a class or method in the . py file, double-click it in the 
Path Browser. 
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Path Browser 


_! baddb: package 
: La) compiler: package 
—! ctymes: package 
_ curses: package 
| distutils: package 
eq] email: package 
= mime: package 
Be __init__ py 
B® application.py 
= audlo.py 
i base.py 
HR Image py 
bt Message.py 


=] /Library/Frameworks (Python Framework Versions /2.5/lib/python2.5 P 
+ he 


= 
7. 


Figure 2-4: The IDLE Path Browser. 
Debugging in IDLE 


When you run code in IDLE's interactive mode, it informs you of errors the same way 
you've seen when using interactive mode in a Terminal or shell window. 


When you run code from the text editor, IDLE also informs you of errors. Depending on 
the kind of error and on your version of IDLE, the error message appears in a dialog box 
or in the Python Shell window. 


You can use the IDLE debugger to step through your program in various ways and 
display the values of names. The IDLE debugger is shown in Figure 2-5. 


Warning On the Mac OS, the IDLE debugger that comes with Python 2.4 and earlier has 


a bug(!). When you open the debugger window, you must immediately resize it; 
otherwise, IDLE will hang. 
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Figure 2-5: The IDLE debugger at work. 
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Chapter 3: Basic Elements and Syntax 


This chapter briefly introduces some of the bits and pieces that make up the Python 
programming language. You find out about names and the differences between statements 
and expressions. We introduce several types of data and operators that work on these data 
types. We also explain how to include documentation in your program as well as how to 
understand error messages Python might give you. Finally, you get a taste of various 
kinds of control structures, loops, and other code blocks available in Python. 


Making Names and Storing Values 


In most programming languages, there are ways to associate a name or variable with a 
value. Here's an example of such a statement in Python: 


num = 45 


Many programming languages call a statement such as num = 45 "assigning a value to a 
variable." But Python doesn't really have variables (you don't need to know why), so this 
book uses the term name instead of variable. 


TECHNICAL In Python, a name refers to an object. A name is actually a label for a 

STUFF memory location in the computer that stores something—a value, a 
chunk of code, or any sort of thing Python understands. All these 
"things"—anything that's stored in a memory location—are called 
objects in Python. Therefore, Python programmers often say that 
when you enter a statement such as x = 5 in Python, you are binding 
a name (x) to an object (5). You can have multiple names for the 
same object. 


The = symbol (a single equals sign) tells Python to associate a name with a value. When 
making an assignment, the name is always on the left, and the value is on the right, as in 
this example: 


myname = "myvalue" 


When you've given a name to a value, the name stands in for the value when you are 
writing code. For example, if you've already created the assignment myname = 
"myvalue", the following two statements do the same thing—they assign the name 
mynewname to the value 'myvalue': 


"myvalue" 
myname 


mynewname 
mynewname 
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There are other ways to give a name to a value. For example, you can give a name to the 
result of a calculation or the information that a function returns. The def statement 
(which defines a function) and the class statement (which defines a class) also create 
names. 


Naming rules 
You can use any names you want, as long as they follow these rules: 


e Names must start with either a letter or an underscore character (_ ). 
e You can't use any of Python's reserved words or keywords. 


These are words that have specific meanings to Python, such as if (which begins 
an if statement). 


e Names are case-sensitive. num is different from Num and nun. 

e By convention, most Python programmers use lowercase for names that 
stand for values. They use other rules for names that stand for functions and 
classes—you can find out more about these conventions in later chapters. 

e It's a Good Idea to use meaningful names. If you name everything with a single 
letter, for example, you may have trouble remembering what the name stands for 
later. 


TECHNICAL Statements and expressions: Seeing the difference 
STUFF 
Python code consists of statements and expressions. Knowing these 
terms isn't essential to writing basic code, but sometimes they help 
you to understand an error in your code (for example, you can't use a 
statement where an expression is required). 


A literal is a chunk of text in a Python program that specifies the 
value for one of the basic Python data types (such as a string or 
number). When you run your program, Python creates an object with 
the literal's value. 


A statement is like a command—t tells Python to do something. For 
example, the statement x = 25 tells Python to give the name x to the 
value 25, and print x tells Python to display the value of x. 


An expression 1s one or more operations that produce a result. The 
operations can involve names, operators, literals, and function or 


method calls. 


It's easiest to show the difference between expressions and statements 
by example. Note in these examples that when you enter an 


a 


expression in the interpreter, the interpreter prints it, but when you 
enter a statement(which doesn't create any output—except for print 
statements), nothing visible happens: 


>>> "monty python" # This is an expression and a 
Literal. 

"monty python' 

SSS Kn = 25 # This is a statement. 25 isa 
laeeizal . 

SSS se # This is an expression. 

25) 

Ss > 2am fil, 2, 3] 7 Mails 18 also am Syoressiom, 
True 

SSS Clee iGO (() ¢ # This is a statement. 

Die wecwicn I # return is a statement; 1 is an 
expression. 

>>> foo () #? COO 18 € Memes cOO() as ein 
expression. 

ll 


Note that Python allows you to put multiple statements on a line by 
separating each statement with a semicolon, but you should avoid 
this because it makes programs less readable: 


SS> ek = lp vy = 2 


SSS z, WV 
(ie 2) 
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Data Type Does Matter 


When you write computer programs, you usually want to give the computer information 
(data) and have it do things with the data and give you results. Programming languages 
like Python have several data types that you do different things with. 


You decide which data type to use depending on what you want to do. For example, if 
you want to do mathematical calculations, you might choose a number data type. 


The following list briefly introduces some of Python's data types: 


¢ Numbers are for data that you want to do math with. 

e Strings are for text characters and for binary data. 

e Sequences are for lists of related data that you might want to sort, merge, and so 
on. 

e Dictionaries are collections of data that associate a unique key with each value. 

e Sets are for doing set operations (finding the intersection, difference, and so on) 
with multiple values. 

e Files are for data that is or will be stored as a document on a computer. 


Numeric data 
Python has four built-in numeric data types, as shown in Table 3-1. 


Table 3-1: Python's Built-In Numeric Data Types 
=™Open table as spreadsheet 


Type (keyword) Example Used for... 

Plain integers (int) andlong 7 6666666666L Whole numbers (long integers 

integers (long) are very large whole numbers.) 
Floating point numbers 1.1714285714285713 Real numbers 

(float) 

Complex numbers (complex) (3+4)) Imaginary numbers 

Decimal numbers decimal.Decimal("181.2") Calculations requiring decimal 
(decimal .Decimal) arithmetic 


Except when you're doing division with integers or using the decimal module (see 
Chapter 7), you don't have to worry about what kind of number data type you're using. 
Python converts numbers into compatible types automatically. For example, if you 
multiply an integer and a floating point number, Python automatically gives the answer as 
a floating point number: 


SSS x = 5 


39 


For more information about numbers and number data types, see Chapter 7. 
Sequential data 


Sequential data types contain multiple pieces of data, each of which is numbered, or 
indexed. Each piece of data inside a sequence is called an element. 


REMEMBER The cool thing about sequential data types is that you can manipulate the 
whole sequence, chunks of the sequence, or individual elements inside the 
sequence. 


Three sequential data types are built into Python: 


e Lists can store multiple kinds of data (both text and numbers, for example). You 
can change elements inside a list, and you can organize the data in various ways 
(for example, by sorting). 

e Tuples, like lists, can include different kinds of data, but they can't be changed. In 
Python terminology, they are immutable. 

e Strings store text or binary data. Strings are immutable (like tuples). 


Table 3-2 introduces Python's built-in sequential data types. 


Table 3-2: Python's Built-in Sequence Data Types 
=>Open table as spreadsheet 


Type Kind Example Used for... 

(Name) 

str (String) Immutable * = "monty python" Storing text 

tuple Immutable * = ("a", 2, "33") Storing a set of items you want fast 

(Tuple) access to 

list (List) Mutable * = ["here", "is", Storing a set of items you want to be 
"my", "list", 47] 


able to change readily 


To see the data type of a Python object, use the type () function, like this: 


>>> type ('foo') 
Syoe sic’ > 


Dictionaries 


Python's dictionary (its keyword is dict) is a data type that stores multiple data items 
(elements) of different types. In a dictionary, each element is associated with a unique 
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key, which is a value of any immutable type. When you use a dict, you use the key to 
return the element associated with the key. 


You use a dictionary when you want to store and retrieve items by using a key that 
doesn't change and when you don't care in what order Python stores the items. (In 
dictionaries, elements aren't numbered.) 


Tip Here's what a Python dictionary is not: A Python dictionary bears only a small 
resemblance to the kind of dictionary that contains words and their definitions. In 
Python, a dictionary is more like a list of employees and their employee numbers. 
Because each employee number is unique, you can look up that employee by typing 
his or her number. 

REMEMBER Dictionaries are mutable, like lists, but their keys are immutable. 


Here is an example of a dictionary with two key:value pairs: 


swallow velooniy = ("eumopeam"s YA", Veitiwuceim’s Meo} 
You can find out much more about using dictionaries in Chapter 9. 
Sets 


A set stores multiple items, which can be of different types, but each item in a set must be 
unique. You can use Python sets to find unions, intersections, differences, and so on—just 
like the sets that probably annoyed you in school. 


One use for sets is when you have repetitious data and you want to ignore the repetition. 
For example, imagine that you have an address database and you want to find out which 
cities are represented, but you don't need to know how many times each city appears in 
the database. A set will list each city in the database only once. 


The syntax for a set is a little different from the syntax of the other data types described 
in this section. You use the word set followed by a name (or a group of elements) in 
parentheses. Here is a set that finds each unique element in a list. You'll notice that the 
elements are out of order in the set. That's because Python doesn't store set elements in 
alphanumeric order (the same is true for dicts): 


SSS mvlist = | 'spem!, Vilewely’, “sem, Yolloziews’, ‘sem! | 
>>> set (mylist) 

seis (| Jileweihy!, Velliericms”, Ysjecun’ 

Files 


Python uses the file data type to work with files on your computer or on the Internet. 
Note that the file type is not the same as the actual file. The file type is Python's 
internal representation of a computer or Internet file. 
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REMEMBER Before Python can work with an existing file or a new file, you need to 
open the file inside Python. 


This example opens a file called myfile: 
open ("myfile") 
You can do a lot with files in Python. Check out Part IV to find out more. 
Data types have methods 
In Python, each data type has built-in actions associated with it, which let you do various 


things with the data. These actions are called methods. Methods are tailored to work 
efficiently with their particular data types. 


Calling a method—An example 


Here's an example of how you use, or call, a method on a piece of data. Strings (text 
characters inside quotation marks) come with a method called upper () , which makes all 
the characters uppercase. To use the upper () method on a string, follow these steps: 


1. Give a name to the string, like this: 
2. >>> zed = "lowercase string" 


3. Type the name, a dot, the method name, and a set of parentheses. 


This example changes the zed string's characters to uppercase: 


>>> zed.upper () 
"LOWERCASE STRING' 


TECHNICAL Incidentally, you can also use methods with values that haven't been 

STUFF given names. For example, the code 'hi'.upper() gives the result 
'Hr'. One method that's commonly used this way is the string 
method j0in(), which combines strings. Most of the time, however, 
it's good practice to give names to values before using methods on 
them. 


Passing information to a method 


The parentheses at the end of a method—for example, upper () —tell Python to perform 
the action associated with the method (that is, to call the method). 


Tip If you forget the parentheses, Python gives you some information about the method 
object instead, as in this example: 


>>> 'mystring'.upper 
<built-in method upper of str object at 0x82071a0> 
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The parentheses have another use, too. When calling some methods, you sometimes need 
to include additional information about how the method should act. You put this 
information (also called the argument) inside the parentheses. This is called passing data 
(or passing an argument) to the method. 


For example, here's how you would use a list method called append() to get Python to 
add an item to a list: 


1. Give a name to a list with several text strings like this: 
>» 22> shopping list = [Yeqgs", “secon, "“sjeem? | 
3. To add an item to the shopping list list, type the list name, a dot, the 


method name, and a new item inside parentheses, like this: 
4, SS> slacjspulmg llisic .ejeyerSsiniel (lomicieeie  ) 


No 


You can type the list name again to see the new contents of the list: 


>>> shopping list 
['eggs', 'bacon', 'spam', 'butter'] 


Operators Are Standing By 


Operators are another tool for working with data. You'll recognize arithmetic symbols 
such as + and / for addition and division, but Python also comes with operators that cut 
up text, check whether two objects have the same value, and so on. This section 
introduces several kinds of Python operators: 


e Arithmetic: The math symbols we're all familiar with as well as some less-well- 
known ones. 

¢ Comparison: These operators tell you which of several things is bigger. 

¢ Logical or Boolean: These operators test whether things are true or false. 

e Conditional: This operator allows you to choose one of two values based on a 
logical test. 


Arithmetic operators 

Python understands a variety of math symbols. Here are a few of them: 
e =Plus (+): Addition 
e Parentheses (() ): Grouping 
e Asterisk (*): Multiplication 


¢ Double asterisk («*): Exponentiation (x to the power of y) 


Tip Some of these operators also work on data types other than numbers, but they may 
work differently. For example, + adds two numbers (2 + 2 gives the result 4), but it 
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concatenates strings ('2' + '2' gives the result '22'). You can't use an operator 
with two incompatible data types. For example, if you try to use + with an integer 
and a string, Python returns an error. 


We discuss arithmetic operators in greater detail in Chapter 7. 
Comparison operators 


Comparison operators test the relative sizes of two pieces of data and give either True or 
False as the result. The symbols for these operators are as follows: 


e less than (<) 

e greater than (>) 

e equal to (==) 

¢ not equal to (!= or <>) 


You can use comparison operators by themselves or in combination. For example, you 
can test whether something is less than or equal to something else by using the operator 
<=; 


REMEMBER You use == (two = symbols) to test whether something is "equal to" 
something else. A single = symbol gives a name to a value. 


You know about comparing the sizes of numbers—for example, 3 > 2. But Python can 
compare values of most other data types, too. (There are a few types you can't compare, 
such as complex numbers.) When you compare items by using these operators, the result 
is either True or False (except or some special data types in third-party modules). Here 
are a couple of examples: 


e Strings are compared using alphabetical order, but all capital letters come before 
(are smaller than) all lowercase letters. 
O Som Vel < Mo!" 


e True 
Se A xe 
e True 


¢ Each item ina list or tuple is compared against the item with the same index 
number in the other list or tuple. 

Oo SSS fli < [ai 

OO eS) 


Also, lists and tuples are compared by length. 


So (ile se. Si) Sak, 22] 
True 
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Boolean operators 


Python has three operators that test whether expressions are true or false. These are called 
Boolean operators (you might also see them referred to as /ogical operators). 


The Boolean operators work as follows: 


e and stops testing when it encounters a false condition. 
¢ or stops testing when it encounters a true condition. 


Python tests an expression with and and or operators from left to right and returns 
the last value tested. (These operators don't return True and False unless the 
expressions themselves use comparison operators.) 


SSS WILY guncl i aiavel “ome 


"one! 

SSS Vil" of I or Yona? 
Pale 

Sse (2 « 3) o% (5 > 6) 
True 


¢ not returns True if the expression is false and False if the expression is true. 
O Se) ino Tome’ 
oO 6False 
(Stuff like this is why your grammar teacher told you to avoid double negatives.) 
Tip In Python, false has a special meaning. It refers to anything that is zero or "empty." 


For example, the number 0 is false, and so is a string with no characters ('') and a 
list with no elements ([]). You usually use Boolean operators to test for "emptiness.' 


! 


For more information about Boolean operators, see Chapter 10. 


Conditional operations 


Conditional expressions are new in Python 2.5. They use this form: 


Dy aie (CG Le AG 


They work like this: First, C is evaluated. If C is true, then X is evaluated to give the 
result. Otherwise, Y is evaluated to give the result. 


This example prints 'bar' because False evaluates as false: 


Pee jorealinie Mir@@” aie wellse Gulse Voie! 
loaves 


For more information on conditional expressions, see Chapter 16. 
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Order, please! 


You probably remember from math class that when you do arithmetic, multiplication 
takes place before addition (for example, 3 + 2 x 5 = 13 because 2 x 5 is evaluated first). 
Likewise, operations in Python happen in a specific order, which is called operator 
precedence. As in arithmetic, operations in parentheses come first. 


Table 3-3 describes operator precedence in Python. Operators in higher rows are 
evaluated before operators in lower rows. If multiple operators appear in a single cell in 
the table, that means they are equal in precedence and are evaluated from left to right 
when they appear in an expression. 


Table 3-3: Operator Precedence 
=™>Open table as spreadsheet 


Symbol Type of operator 

(a,b), [a,b], {a:b} | Creation of tuples (via the comma operator), lists, and dictionaries 
S[i],s{i:4], Index and slice operations (on sequence data types); brackets are 
s[i:j:k] also used to select mapping elements 

£(...) Function calls 

x**Y Exponentiation (evaluated right to left) 


x*y, x/y, x//y, %*y Multiplication, division, integer division, modulo (remainder) 
x+y, X-y Addition, subtraction 

x<y, x<=y, x>y, x>=y, Comparison and equality 

x==y, x!=y 

x is y,x is not y Identity 


x in y,x not in y Membership 


not x Logical negation 

x and y Logical and 

K or y Logical or 

if/else Conditional expression 


Special powers of the = symbol 


The = symbol might look like an operator, but in Python, it isn't. The = sign is used for 
assignment—associating a value with a name. 


REMEMBER In Python, don't use = to get the result of arithmetic operations in the way 
you do when you use a calculator. Python stores the results of such 
calculations automatically. Also, don't use = to test whether one thing is the 
same as another (for that, you use == or is). 
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If We May Comment... 


Python, like almost all programming languages, lets you add comments to your program. 
Comments are text in your program that the Python interpreter ignores. Why would you 
want to add stuff that the computer ignores? Comments help other people understand how 
the program works. They also let you decipher your own code six months later! 


To add a comment to your program, type # (a pound sign) and then type your comment. 


You can type # either at the beginning of a line of code or after the code. If you write a 
comment that spans more than one line, use # at the beginning of each line. 


The example below shows comments in action: 


# These are lines from Monty Python's "Parrot Sketch" 
live pareor = ""E"S imim LO tle tijerels |” Michael Palin 
cad pacer = “Bereitt of lite, "© BesiES ium peace!” John Cleese 


It pays to be conventional 


Python programmers have conventions for how they add comments to their programs. 
The conventions make the comments and code easier to read by other programmers. 


Here are some of the guidelines for comments: 
¢ Keep lines to fewer than 80 characters. 
(This convention applies to the whole program, not just the comments.) 
e Indent comments the same amount the code is indented. 


For more information on indenting, see "Deciphering Code Blocks," later in this 
chapter. 


e Use inline comments (comments that come after a line of code) sparingly, and 
only to explain things that aren't clear from reading the code. 


The most important convention for comments is this one from the Style Guide for Python 
Code (http://www.python.org/dev/peps/pep-0008): 


Comments that contradict the code are worse than no comments. Always make a priority 
of keeping the comments up-to-date when the code changes! 


47 


Documenting your program 
Python supports a special kind of comment called a docstring. 


As with other comments, you use docstrings to help others understand your code. What's 
different about docstrings is that they work with Python's help utility so that someone can 
figure out what your code does without having to look at the actual file. 


A docstring is always the first line in a function. It can be more than one line if you begin 
and end it with three quotation marks. Here's an example docstring for a function we 
made up called printme (): 


def printme (me) : 


Woy 


Prints its argument. 


Woy 


Here's what you see if you ask for help on the printme() function: 


>>> help (printme) 
Sls) Gil WeIMNSwLeia joieslinieme skin mechs  iielalim 3 


printme (me) 
Prints its argument 


For detailed instructions on writing docstrings, see Chapter 11. 


Oopsies! Understanding Error Messages 


When Python finds syntax mistakes or other errors in your code, it gives you error 
messages to help you figure out what the problem is. For example, you get this error 
message when you try to use an equals sign (=) to do an arithmetic calculation: 


SS> f= 6 = 
Walle YV<crecligns’, Ilsuieve) AL 
5 =6 = 


“a 


SyntaxError: invalid syntax 


The messages differ depending on the type of error but usually include these types of 
information: 


e The filename 

REMEMBER When you work in interactive mode, the file is always '<stdin>'. 
e The line of code where Python first figured out there was an error 
e The kind of error 
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For a syntax error, a caret (*) points to the problem character (or to the place where the 
interpreter first figured out there was a problem). 


Some error messages include information about the conditions in which the error 
occurred (this is called traceback information). The message you get when you try to 
divide by zero is an example: 


>>> x = 0 
SSS 9 f x 
Traceback (most recent call last): 
Bile “<stcilime”, line 1, iin <mociiile> 
ZeroDivisionError: integer division or modulo by zero 


Deciphering Code Blocks 


In Python, you can write chunks of code, or code blocks, that perform a single function or 
group together several related tasks. We discuss the several kinds of code blocks and how 
to write them in the following sections. 


The big deal about indentation 


In many programming languages, you're encouraged—but not required—to indent a 
chunk of code that works as a unit. Some programmers are notoriously lazy about making 
their code readable to humans, and that means programs in those languages may not 
include indentation. 


In Python, however, indentation is built into the interpreter. Code that's part of a block 
must be indented, or else you get an error message. This is one of the features that makes 
Python especially readable. 


Indentation is so important to Python that Pythonistas printed up a T-shirt: 
Python: 

programming 

the way 


Guido 
indented it 


Code block syntax 


There are two simple syntax rules to remember about code blocks: 
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e The code on the first line of a code block always ends with a colon (:). 
e The code underneath the first line of a code block is indented. 


Python programmers almost always use four spaces for the first level of indentation, eight 
spaces for the next level (a code block nested inside another code block), and so on. 


The indentation must be the same for every line of a block. You get an 
IndentationError message if you use, for example, four spaces on one line of a block 
and three spaces on another line in the same block. 


Basic code blocks: Control structures and loops 


Basic code blocks in Python begin by testing whether a condition or expression is true. 
Python carries out instructions depending on the result of the test. 


In the following sections, we introduce several kinds of control structures and loops and 
provide an example of each. Don't worry if you don't understand every bit of the code in 
the examples right now; it's explained in more detail in Chapter 10. 


If statements 


An if statement carries out one or more tests in a specified order. The first line always 
begins with if; subsequent tests begin with e1if (which stands for "else if"), and you can 
add an else clause that runs if all the tests fail. 


Here's an example of an if statement that includes an else clause. The else: line is not 
indented under the if: line because it's part of the structure of the if statement, as 
opposed to being one of the blocks inside the statement: 


if statement == "We're fresh out of red Leichester, sir": 
response = "Oh, never mind, how are you on Tilset?" 
elif statement == "We have Camembert, yessir"™: 
response = "Fetch hither the fromage de la Belle France!" 
else: 
response = "I'm going to have to shoot you." 
For loops 


A for loop repeats a block of instructions, usually a specific number of times. 


This example takes each item in little list one at a time and prints it. 


pee little list = "tae", Vemaielk”, Newowim’, “itor” | 
SSS Or ele wieem win Lacie lisiceg 
IeIIME iene aliciem, Ye, 


hele: =< CPULGLEle = Igieounay = deep 
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Check out Chapter 10 for the lowdown on for loops. 


While loops 
A while loop repeats an instruction as long as a particular condition is true. 


The following example code prints a number as long as the countdown value is greater 
than zero. It also subtracts 1 from the countdown value each time around the loop. 


countdown = 10 
while countdown: 
print countdown, 
countdown -= 1 
jorcakione “Moyilaiisiic@ye ie 1M 


Here's what it prints: 


10 9 8 7 6B 4 3 2 i lollaisicoitic! 
Try statements 
A try clause attempts to run some code. It is accompanied by one of the following: 
e One or more except clauses (which catch errors raised in the try block) 
Atry/ except statement can also have an optional else clause. 
e A finally clause 
This clause runs automatically after the try clause and re-raises any errors. 


Tip Starting in Python 2.5, a try clause can have all the above elements at once: try, 
except, else, and finally. 


Here's a try statement that receives some input, does one thing if the input is an integer, 
and does another thing if it isn't: 


wise dmjewic = wel anjoulic (MEiMeeic eum aimnicecieies )) 
ilerlanyas 

number = int (user input) 

print "You entered", number 
except ValueError: 

print "Integers, please!" 


With blocks 


The with statement (new in Python 2.5) executes a block in a particular context. This is 
an advanced feature that we discuss in Chapter 16. 


a 


Code blocks that create a namespace 


Some Python code blocks set up a special area to store the names they know about. 
(Names are labels for areas of computer storage that hold particular values or other 
objects.) These areas are called namespaces. 


Each namespace is self-contained, so names from different parts of a program don't bump 
into each other. This feature lets Python programmers create modular code that can be 
extended in many different directions without having to worry about whether their new 
code uses the same names as code in other blocks, modules, or programs. 


Several kinds of code blocks create namespaces. Here is a brief introduction to some of 
them. 


Functions 


A function is like a blender or a bread maker. You put data into it, it does things with the 
information, and it returns a result. 


Here is a small function (you may note there are no spaces at the end of the text strings. 
The comma stands for a single space): 


>>> def myfunction (x): 
VS EF 
iim: x, VeEuLSecl iO tine joower or", =, Yas", vy 
leSie Wied iy 


And here's how you might call it. 


>>> result = myfunction(5) 

5 raised to the power of 5 is 3125 
>>> result 

3125) 

Modules 


A module is a text file that can contain any kind of Python code, but a module usually 
organizes tools that work in a particular way. Sometimes it also includes names that store 
particular values. You can make a module available to another program by importing it. 


TECHNICAL Modules are the key to Python's portability and power. Python itself 
STUFF is made up of modules. 


For example, Python has a math module that includes a number of specialized 


mathematical tools, and it also defines some names, including pi. Here's how you import 
it: 
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>>> import math 


Here's how you access the name pi that it defines: 


>>> math.pi 
3 IANS O2635. 3 589) 19) 3) 


This book covers many of the important modules that come with Python. For instructions 
on viewing the list of modules that Python comes with, see Chapter 2. 


Classes 


A class is similar to both a factory and a blueprint in that it makes copies of itself—but 
the copies are what do the actual work. Here is an example of a class: 


class SayMyName: 
clei liavic (SGI, mihymenne)) ¢ 
self.myname = myname 
def say(self): 
print "Hello, my name is", self.myname 


You use a class to create objects called instances that can do specific things. This code 
creates an instance of the SayMyName class: 


namel = SayMyName ("Aahz") 


An instance has access to the class's methods (which are just functions attached to a 
class). This code applies the say () method to the name1 instance: 


>>> namel.say() 
Hello, my name is Aahz 


Classes are useful because they can combine both data and methods that operate on that 
data. Python's data types—lists, strings, and so on—are based on classes. 


Be) 


Chapter 4: Grand Tour of the Python 
Language 


In this chapter, we present and analyze a whole Python program so you can see how all 
the parts work together. 


Tip You may want to keep a bookmark on the program so you can flip back and forth 
between the whole program and the discussion. 


The spider.py Program 


Our sample program (Listing 4-1) finds all the pages within a Web site. When you run it 
from the command line, it prints the results to stdout (the operating system's standard 
output stream). You can also use it as a module to make it a building block in a larger 
program. 


REMEMBER The program doesn't check links to Web pages external to the site. 
This program has three parts: 


e The import statements 
e The function and class definitions 
¢ The body of the program, which calls the functions and class 


Listing 4-1: spider.py 


HERE HEE HEH HE EE EEE HE HE RHE EEE HE HH EE RHE HEE 
program: spider.py 

author: aahz 

# version: 1.1 

date: June 2006 

# description: start on command line with URL argument. 
# Finds pages within a web site. 
FEREE EE EEE EEEEA EE EERERERER EEE HEHEHE EEE EERE RE 


# These modules do most of the work. 
import sys 
import urllib2 
import urlparse 

import htmllib, formatter 

ECO GSiecliaCG WO) aor Sicwaline lO 


Cer los sieclowie (msi) 3 
"""Print msg to the screen.""" 
print msg 
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CIE Cjene joerc (uuell,, Log) 8 
""wRetrieve URL and return contents, log errors.""" 
ileslanyas 
page = urllib2.urlopen(url 


a 


except urllib2.URLError: 
log("Error retrieving: " + url) 
iesiciicim YY 


body = page.read() 
page.close() 
return body 


CISiE iesinvel_ ames (lonenmil)) § 
WiMReictiem @ asic © Ilainkks alias loreal , 
# We're using the parser just to 
writer = formatter.DumbWriter (StringI0() ) 
f = formatter.AbstractFormatter (writer) 
parser = htmllib.HTMLParser (f) 
parser. feed(htm1) 
parser.close() 
return parser.anchorlist 


class Spider: 


Woy 


The heart of this program, finds all links within a web site. 


run() contains the main loop. 

PLOCESS PEGS) LSeKIEWSS CaCI PAGS aimel times cle limes. 
cei amit (Seli, SirarcURL, lhog=aNeme) 3 

# This method sets initial values 

self.URLs = set() 

self.URLs.add(startURL) 

self.include = startURL 

Seli., links tO process = | StaccURbL] 

if log is None: 


Use log stdout function if no log provided 


SEli log = log siccdoitic 
else: 
self.log = log 


def run(self): 


# Processes list of URLs one at a time 
while SGlir, Jimks co jpirecess s 
wel = sell, _limks co Process .S0p () 
sieulie . lleye; (Miekenesealewyatinves Yar qbliell)) 


SOLE PKOCOSS jOseiS (wie L) 


Clot hel ain Siice(selic, Lalas) 3 
+ Checks whether the link starts with the base URL 
return link.startswith (self.include) 


CS [OMOCESS Sac (Selle, well) ¢ 
+ Retrieves page and finds links in it 


aD 


Inieml = Ge pace (uel, seilic. les) 

ioe lame alm itasael Nalintes (lniemll) g 

Handle relative links 

link = urlparse.urljoin(url, link) 

Selle . here; (VClagelciinesg ¥ ap iLatint<)) 

Make sure this is a new URL within current site 

dit diame MO alia SSI URLS aimcl SSlittuel aim Siice (ims) ¢ 
self.URLs.add(link) 
self. links _to_process.append (link) 


aLie name = Th aici os 
# This code runs when script is started from command line 
startURL = sys.argv[1] 
spider = Spider(startURL) 
spider.run() 
for URL in sorted(spider.URLs) : 
jorealiane. (UIE 


Examining a Python Program 


First, we discuss how the program is organized. Then we show you some of the Python 
features the spider uses. 


Setting up the structure 


The top of spider. py contains a bunch of import statements. Looking there gives you a 
quick idea of what the program does and how it works, because most Python programs 
rely heavily on imported modules. 


If you know the modules that come with Python, you'll know from looking at the import 
statements that this program is designed to read and process information from URLs 
(Web site addresses). Here is what each module does in the program: 


e The sys module retrieves a starting URL from the command line when 
spider.py is run as a script. (Chapter 17 covers the sys module.) 

e¢ urllib2 retrieves data from URLs.(See Chapter 20.) 

e The urlparse module slices and dices URLs and puts them back together. (See 
Chapter 20.) 

¢ htmllib and formatter are imported on the same line because they're used 
together to parse information out of HTML files; in this program, we use them 
only to get the links (anchors or HREFs). (See Chapter 20.) 

e from cStringIO import StringIoO imports an object that pretends a string is a 
file. (See Chapter 18.) 
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Tip Note the from...import syntax. We recommend avoiding from..import * 
because it can clutter up your program with conflicting names, but from... 
import syntax is appropriate when the module exports a single class, as here. 


Initializing the spider 
Down near the bottom of the program is this statement: 


aL ie name == ' main ': 


All the code after that statement runs only when you start the program from the command 
line. 


TECHNICAL In Python programs, this code often contains testing instructions. If 

STUFF ours did, typing python spider.py at the command prompt would 
cause spider.py to test itself. In our program, the code processes a 
command-line argument. You can start the program from the 
command prompt and give it a URL to examine: 


% python spider.py http://pythonfood.com/spider-test/ 


Between the top (the import statements) and the bottom (the command-line code) lies 
the meat of the module. The spider. py script contains these blocks of code: 


e Three functions: log stdout (), get_page(), and find links () 
e One class: Spider 


If you look back down at the bottom, you notice the following line: 


spider = Spider(startURL) 


This line creates an instance of the Spider class. This means the Spider class is the heart 
of this module. 


REMEMBER When trying to understand what a Python script does, it's best to start at 
the bottom and work your way up because classes and functions have to be 
created and named before they are called (before the program asks them to 
run). So in a Python program, any def statements (which create functions) 
and class statements (which create classes) always come before 
statements that actually run the code inside the functions and classes they 
define. 


Running the spider 
Figure 4-1 shows which parts of the spider.py program talk to each other (pass 


information). You might find Figure 4-1 useful while reading this section or reading the 
program itself. 


a 


Class Spicer: 


Figure 4-1: Information passing in the spider.py program. 
Reading the spider web 


In the Spider class, the process _page() method is separate from run() because run () 
is easier to understand when you don't have to look at all the code in process page() at 
the same time. This choice exemplifies two reasons behind Python's modular design 
philosophy. 
It's a good idea to separate out chunks of code that do different things: 

e Separating tasks into chunks makes it possible to use each chunk on its own. 


e Reading a chunk of code that does only one thing is easier than reading a chunk of 
code that does multiple things. 


The workings of the Spider class 


To see how the heart of the spider. py program processes information, look at the 
Spider class. To use this class, you have to create an instance of it. 


REMEMBER Instances are objects created from classes (see Chapter 13 for the details): 
e Aclass is like a template. 
e An instance is like a document created from that template. 

The parts of the Spider class perform the following tasks: 


e The init () method sets up the instance. 
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e The run() method does the real work. It's a simple loop that 
o Processes new pages (from the self. links to process list) until there 
are no more pages 
o Calls the process _page() method each time through the loop, using a 
new item from the self.links to process list. 

e The process page() method contains a loop that includes an if statement. (See 
Chapter 10 to find more about if statements and loops.) 

e The loop sends a URL to the get_page() function, which opens the Web page, 
reads it, and returns its contents. Then the loop sends the contents of the Web page 
to the find_links() function, which logs and returns a list containing all the 
links in that URL. 

e The statement if link not in self.URLs and self.url in  site(link): 
checks whether this link is already in the self.URLs set and whether it's part of 
the Web site we're analyzing. (This spider only checks links internal to the Web 
site of the original URL. If you want to check links to other Web sites, you need 
some different code here.) If the link passes the tests, it's added to the self.URLs 
set and the self. links to process list. 


Designing for expansion 


To understand another design decision in the spider. py program, look at the 
url_in_site() method. It checks whether the link is part of the Web site. 


Making url_in_site() a separate method doesn't make it easier to read the program, 
because the method has only one line of code. We decided to make a separate method for 
another reason: doing so makes it easier to subclass Spider. Subclassing is making a new 
class that shares most of the features of a class but either overrides or extends some of its 
functionality. (Chapter 13 has more about subclassing.) One way to subclass Spider is to 
extend the behavior of url_in_site() so that it works with both HTTP and HTTPS 
URLs at the same time. (The current program handles only one type at a time.) Having 
url_in_site() separate from process page() means that such a subclass is easier to 
write. This illustrates another good programming practice. 


REMEMBER When you're designing a program, think about the possibility that someone 
might want to customize or subclass your program in the future. (But don't 
overdesign or add features before they're needed. Python programs are 
easy to change, so you can quickly add features when necessary.) 

TECHNICAL The process page () method combines the URLs returned from 

STUFF find links() with the base URL specified by url. We could have 

put this functionality into the find links () function. But leaving it 
out makes find_links() more generally useful—we (or others) can 
reuse it in programs that work with relative URLs in a different way 
from this one. This is an example of the design philosophy described 
in the preceding paragraph. 
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Using Building Blocks 


This section describes how the spider.py program uses Python's most important 
building blocks and tools—functions and methods, loops, containers, names, strings, and 
error checking. 


Function and method tidbits 


The spider.py program illustrates key principles to keep in mind when writing functions 
and methods. 


Understanding default parameters 


The init () method has the parameter 1og=None. This is called a default parameter. 
(See Chapter 11.) It means that when you make an instance of the Spider class, you can 
specify a parameter for log, but you don't have to. If you don't, the instance automatically 
uses None as the value for log. 


Default parameters give you choices and make your code more flexible. Because of the 
default log parameter, a number of different options for logging can be used with the 
Spider class. 


REMEMBER The other side of the coin is that if you use a default parameter, you need 
to write code to test for the default value. In the Spider class, this code is 
partofthe init () method. It assigns the log stdout () function to 
self .log (if the user doesn't specify a value for 10g). In contrast, 
get _page() requires a log argument and therefore doesn't need to check 
for a default value. 


Recognizing bare naked functions 


REMEMBER Most function/method calls have self or a module name in front of them, 
followed by a dot. When you're reading a program, if you see a bare 
function or method (or class)—one without self or a module name—it 
usually means one of these: 


e It was defined in the module (like get_page() and find links () 
are in this program). 

e It was imported by using the from... import syntax (like 
StringIO). 


e It isa built-in function (like sorted(), down at the bottom). 
REMEMBER Functions and methods are essentially the same thing—a code block that 
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performs an action and returns a result. Here's the difference: 


e A function is by itself in a module. 
e A method is part of a class. 


Chapters 3, 11, and 13 describe how to use functions, methods, and 
classes. 


Looping around 
The spider. py script contains three loops: 


e Two for loops 
REMEMBER Python programmers generally favor for loops, because for loops 


can both assign values and provide one element at a time. 
e One while loop 


Tip A while loop is often better when you are both adding and deleting elements, 
SO we used a while loop in the section where the run () method deletes 
elements from the self. links to process list (via the list method pop () ) 
and also adds elements to the list (via process page()). 
Chapter 10 shows you more ways to use loops. 


Collections of data 


Sets, lists, and dicts are Python's data types for dealing with collections of data, especially 
if the data change while the program is running. 


Lists 
Lists (described more fully in Chapter 8) are most efficient, which is one reason we used 
alist (self. links to process) to keep track of all the links we're processing. Lists 


are also good when you need to 


e Maintain elements in a particular order. 
e Allow duplicate elements. 


Sets 


Sets are a good way of handling data when you want to ignore or avoid duplicates. 


Tip We used a set for the primary URL data in the spider class (self.URLs) because we 
wanted only one copy of each URL. Chapter 9 shows more ways to use sets. 
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Dicts 


Dicts are good for data that will be stored and accessed by keys rather than ordered 
alphabetically or numerically. 


Tip If we had wanted to associate some data with each URL in the Web page, we would 
have used a dict. Chapter 9 shows you how. 


Naming names 


If a good name is one that helps you understand what the named object is doing, then 
there are some good names and some not-so-good names in spider. py. An early draft of 
our program had functions named get_links() and find_links(). Those names don't 
really make clear the differences between the two functions, so we renamed 
get_links() to get_page(). 


Tip Programmers sometimes choose terse and not very explanatory names on purpose to 
indicate that you shouldn't pay much attention to the name because it's just a 
temporary name used to convey information (for example, it's used as an argument to 
a function or method). Sometimes a temporary name makes a few lines of code 
easier to read. Take, for example, the lines of code that use the name f in the 
find links () function: 


f = formatter.AbstractFormatter (writer) 
parser = htmllib.HTMLParser (f) 


We could have gotten rid of £ by writing the code this way instead: 


parser = htmllib.HTMLParser (formatter.AbstractFormatter (writer) ) 


But that's kind of long and hard to read, so we decided it was better to split the lines and 
use a temporary name. 


REMEMBER Give users of your modules information about which attributes, functions, 
classes, and methods they should avoid accessing directly, passing to other 
functions, subclassing, or rewriting. (Or, more colloquially, "Das ist nicht 
fiir gefingerpoken!") Sometimes this information is conveyed by using a 
single underscore character as the first character in a name, which means 
the object is private. (See Chapter 13 for more about private attributes.) 
For example, we chose to make self. links to process a private name 
because it's valid only inside the spider class. We could have made 
url in site() a private name for the same reason, but we didn't in order 
to send the message that it's suitable for overriding in a subclass. 


Managing strings 


There are a lot of ways you can work with strings in Python: 
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e Python strings have many methods built in. 


For example, we use the startswith() string method in the url _in site() 
function. (Unsurprisingly, startswith() checks whether a string starts with a 
particular substring.) 


e Many Python modules include additional string-handling functions. 


We use one of these in the process page() method: urlparse. urljoin(url, 
link). The urljoin() function (as the name suggests—see the benefits of 
naming things well?) sticks together two parts of a URL, which we pass in as two 
strings, url and link. 


REMEMBER Before you start writing some special functionality to process strings, 
check whether someone else has already done the work for you. The Cheat 
Sheet lists the most commonly used string methods. 


Handling errors 


One error-handling tool in the spider.py program is this block of code in the 
get_page() function: 


iEIEY? 8 
page = urllib2.urlopen (url) 
except urllib2.URLError: 
Gye) (Vihieicione weSiciestenaliness “ ap well) 
return '' 


This is called a try / except block (see Chapter 10). Its purpose is to catch errors from 
the urllib2.urlopen() function, which tries to open a remote URL. 


What spider.py doesn't have 


The spider.py program works as-is, but it's missing a few elements that are necessary to 
make it a fully functional Python program that follows the conventions of good 
programming: 


¢ Docstrings for each function and class method. 

e Error checking. For example, if you try to run the program from the command 
line without specifying a URL, the program fails messily. Here's what happens: 

e % python spider.py 

O TraAceoack (mosis weceme call 


e ASE) 8 

e File "spider.py", line 91, 
° in <module> 

° startURL = sys.argv[1] 


e IndexError: list index out of 
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e range 


Why are we showing a program without these features when we reiterate ad nauseum in 
this book that you should include these features in your program? Because lots of 
programmers talk about the benefits of comments, documentation, and error-checking, 
but lots of programs (some written by those same programmers) don't do as good of a job 
on those things as they should. This is a "Do as we say, not as we do" situation. 


A complete version of the program with documentation and error checking is on our Web 
site: 


http://www.pythonfood.com/ 


REMEMBER It's good practice to use try with I/O functions and user input or other 
external input. For example, we could improve our program by moving the 
body = page.read() line into the try block—because even if you can 
open a Web page, you might still have trouble reading it. Similarly, 
garbage in the HTML can cause the htm11lib module's parser to choke, so 


the line parser. feed (html) should be ina try block as well, like this: 

jEIEW? 8 

parser.feed(htm1) 
except htmllib.HTMLParseError: 

log (“imiew@re ialincliliae; Ilaalessg 3 iuie ll) 

return [] 
aime ys 

parser.close() 
return parser.anchorlist 


The sidebar, "What spider.py doesn't have," describes another spot where error-checking 
would be useful. 
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Chapter 5: Working Like a Programmer 


Overview 


Professional programmers spend as little as 10 percent of their working time writing 
code. This chapter focuses on what they do the rest of the time. These practices generally 
consume about 60 percent of a programmer's time on a project: 


e Analyzing problems 

e Designing solutions and documenting decisions 
e Debugging 

e¢ Maintaining and improving code 


Warning The final 30 percent of a programmer's time is taken by meetings and wasted 
time. (Sometimes, there's no difference between meetings and wasted time.) 


The Three Ds 


If you're writing a program that's more than a few lines long, don't just sit down and start 
coding. You need to take three steps to make your program the best it can be. These are 
the three Ds: 


¢ Document: Making sure others can read your program 

¢ Design: Making sure your program is solving the right problem and is organized 
so that it's efficient and potentially reusable 

e Debug: Making sure your program doesn't have any mistakes in it and that it 
responds appropriately when given wrong input 


Documenting 
Documentation makes it possible to read, use, and maintain a program: 
e In programming, documentation is all the writing associated with your program 
that isn't code. 
e With a language like Python, readability is a prime virtue. That means your code 
is part of the documentation, too. For example, picking good names is part of 


documentation! 


REMEMBER Programming starts and ends with documentation: 
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e Start by writing the problem that your program is intended to 
solve. 


Before you even start writing your program, it's a good idea to 
write notes about the goal of the program. Pretend you need to 
explain it to someone else—and if you have someone to read the 
notes, so much the better. 


e End by making sure that the documentation still matches the 
program. 
Warning If you don't write documentation, you will find it difficult to use or maintain 
your program after letting it sit for six months (even if you're the only person 
who uses the program). 


Designing 


Designing is actually shorthand for two intertwined parts of programming: analysis and 
design. 


What do you really want? 


Analysis is the process of determining what problem you're trying to solve. It's similar to 
the process an architect makes you go through to remodel your kitchen: "What do you 
really want?" For example, you might want to write a program to automate your data 
backups. That's a general description, but you need a more specific list of tasks you want 
the program to carry out, such as 


¢ How often do you want to make backups? 
¢ Do you want to back up all data or only data changed since the last backup? 
¢ How long do you want to keep backups? 


After you have some specific ideas of what you want, the design phase starts. 


TECHNICAL If you're remodeling your kitchen, then this is the point where the 

STUFF architect draws up a blueprint. The blueprint isn't the kitchen itself; 
it's the plan that needs to be followed to build the new kitchen. If 
you're writing a program, you start to build the outline of the 
program, which is similar to a blueprint. In fact, people who focus on 
the analysis and design of programs are often called software 
architects. 
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Pseudo-coding your thoughts 


One way to design your program is to create an outline of sorts, using what is called 
pseudo-code. You write a sketch of what your program will do, using the structures you'll 
need to use when you write the program (such as class and function definitions and if 
statements), but you don't bother with the syntax details necessary to write working code. 


TECHNICAL Pseudo-code tends to resemble real Python programs, so it takes less 
STUFF effort to convert pseudo-code into working Python code. When the 
pseudo-coding process is finished, the details are easy to fill in. 


Here's an example of pseudo-code that almost works as Python code: 


if today == sunday: 
full backup () 
else: 
incremental backup () 


Debugging 


A bug is a mistake in a piece of software that causes it to work improperly or to return 
incorrect results. 


TECHNICAL Teaching an old bug new tricks 

STUFF 
This use of the word bug was already common before software was 
invented—for example, Thomas Edison used it in 1878 to describe 
problems with his inventions. Some people will tell you, though, that 
the term derives from an incident in which an actual insect was found 
to be causing glitches inside an early computer in 1947. 


Although bug is older than writing programs, debugging has been an 
inherent part of writing software since people started writing 
software. One of the first computer scientists, Maurice Wilkes, is 
reputed to have said, "I can remember the exact instant when I 
realized that a large part of my life from then on was going to be 
spent in finding mistakes in my own programs." 


The Zen of Python 


Writing a complex program is something like designing a building. The architect's 
constraints include time, money, good taste, and the structural limits of the materials. 
Programming also requires balancing multiple needs—often including time, money, the 
feature requests of multiple groups of people, and the availability of sufficient quantities 
of caffeine. 
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Pythonista Tim Peters contributed 19 guidelines for good Python code. We consider them 
the best distillation of the Python philosophy of programming. They're on this Web page: 


http://www.python.org/doc/humor/#zen 


These are design principles, not rules to be followed blindly. They're meant to encourage 
you to think. In some cases the principles may appear contradictory. That's a reflection of 
the fact that programming sometimes requires balancing conflicting requirements. It's 
like the old saying: "Good, fast, cheap—pick any two." The principles can help you 
decide how to prioritize these requirements. 


The principles were originally written to guide the development of Python itself, but they 
also apply to writing your own programs. That's how we discuss them here. 


Some of the most important guidelines are these: 
Explicit is better than implicit. Good code is as self-explanatory as possible: 


e Use names that explain the purpose of the objects they represent. 

¢ Include comments in your code when it isn't obvious what a particular line or 
block does. 

e Avoid hidden effects. (For example, printing to the screen shouldn't erase your 
hard drive.) 


Readability counts. A good program is easy for a human to read and understand. If two 
blocks of code produce the same result, consider using the one that's easier to read. 
(Sometimes cryptic code runs faster, but speed isn't the primary goal of most Python 
programming.) 


Errors should never pass silently. If you write code that doesn't alert the user of errors, 
the errors might cause the program to give incorrect results: 


e Build explicit error checking into your code. 
e Write code to catch any errors that you haven't thought of. 
¢ Report errors to the user of your program. 


There should be one—and preferably only one—obvious way to do it. This guideline 
("There's only one way," for short) is the most popular among people in the Python 
community. This is partly in response to the motto of Perl programmers, which is, 
"There's More Than One Way To Do It." The Python community creates and popularizes 
standard coding idioms (preferred ways of performing certain tasks). Using standard 
idioms saves time because the code is already mostly written; you just plug in your data 
and variables. Python For Dummies includes many standard idioms to get you started. 


Typing import this at the Python prompt in Python 2.1.2 or higher will print the entire 
list: 
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SSS imMjoe@wie wads 
The Zen of Python, by Tim Peters 


Beautiful is better than ugly. 
Explicit is better than implicit. 
Simple is better than complex. 
Complex is better than complicated. 
Flat is better than nested. 

Sparse is better than dense. 


Readability counts. 

Special cases aren't special enough to 
break the rules. 

Although practicality beats purity. 

Errors should never pass silently. 

Unless explicitly silenced. 

In the face of ambiguity, refuse the 
temptation to guess. 

There should be one-- and preferably only 
one --obvious way to do it. 

Although that way may not be obvious at 

first unless you're Dutch 

Now is better than never. 

Although never is often better than special enough 

MEILG/lae MOM 5 

If the implementation is hard to explain, 

it's a bad idea. 

If the implementation is easy to explain, 
it may be a good idea. 

Namespaces are one honking great idea -- 

let's do more of those! 


1) 


Most computer programs have bugs. When you discover problems with the way your 


program works or the results it gives, you debug to 


e Find the source of the mistake. 
e Fix the lines of code causing the problem. 


As you get comfortable with programming and Python, you get a better sense for how the 
computer "thinks," which helps you figure out bugs more quickly. You'll become familiar 


with common causes for common problems like: 


e Syntax errors. Missing punctuation is the most common syntax mistake. For 
example, if you try to create a string but forget a quotation mark, you'll get an 


error like this: 


"Nat oY 


e >>> mystring = "No! Not the Knights who say 'Ni'! 
e Mille “<erecliinS’ Ilse: il 

e mystring = "No! Not the Knights who say 

e nw 


SyntaxError: EOL while scanning single-quoted string 
Tip You'll have fewer syntax errors in your programs if you use a Python- aware 
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text editor that prints different syntax in different colors. 

e Tips for avoiding common syntax errors can be found throughout Python for 
Dummies. 

e¢ Misspelled names. If you name an object and then misspell the name when you 
refer to it later, you'll get an error. For example: 

e >>> lumberjack = "I'm OK" 

e >>> print lumbrejack 

O Wieacesachk (mosic weceme calli lesic)) s 

° iWatike “<erecliine,. Jlauinvey il, aliar ® 

e NameError: name 'lumbrejack' is not defined 

e Using the wrong types of values. For example, if you have a number in string 
format (perhaps from the raw input () function, which converts the input into a 
string) and you try to do arithmetic with it, you'll get a result you didn't expect, 
like this: 


O SSS & = eng imjpoue (imcer € imumilseicg ™) 


O lmlieeie € mumloeiws 45 

e SSS ok we 

Ce SES 

¢ Creating infinite loops. If you create a loop by using a condition that never 
terminates, your program will try to run forever, like the brooms that the 
Sorcerer's Apprentice creates: 


e sorcerer = "asleep" 

e def make broom(): 

° print "another broom!" 
e while sorcerer == "asleep": 

e make broom () 


Tip Chapter 10 shows you how to create loops that you can control. 
e Trying to open files that don't exist or are empty, for example: 
e >>> myfile = open('foo.doc') 
O Traceoack (mosic weceme calli lasic) 3 
e rile “<siecims", line i, idm <mochile> 
QO WOmierors [Heng 2] NO Suen tile or cliimecirorys "ito, 
e Incorrect logic for Boolean expressions 


To get the scoop on correct use of Boolean expressions in your programs, see 
Chapters 3 and 10. 


The best way to find out about common bugs (other than writing your own programs) is 
to look at the bugs other people make. The comp. lang. python newsgroup or the Python 
tutor mailing list are good resources. (See Chapter 22 to find out how to access these 
resources.) The same errors come up over and over, and you'll soon be able to recognize 


them on sight. 


Tip For helpful debugging tips, see "Debugging Strategies," later in this chapter. 


Maintaining Your Programs 
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Okay, you've written your program. Now what? Chances are you'll want to make some 
changes soon. In fact, most programmers spend far more time changing existing 
programs than writing new ones. When you want to update your code, you'll likely be 
tempted to change it in a way that doesn't take a lot of time. But that might not be the best 
way to go about it in the long run. 


Suppose you've written a program to send yourself a daily e-mail with the top ten lines 
from your calendar file. It might look like Listing 5-1. 


Listing 5-1: daily calendar.py 


import smtplib # get the module for sending email 
my address = !'me@example.com!' 
headers = [ 'Subject: Daily calendar', 
"pieces “ sp iy Ackheess , 
"TOs " sh imiy ackicess, 
] # this list spans four lines for 
readability 
MEcIeS = cpein ("my Cellemelie") q 
mise = V\Ve\in! jim (Ineacleies)) a Y\ie\al az Y" .oj@slin (Ginieieies) 


smtp = smtplib.SMTP('mail') # replace 'mail' with the name of your 
mailhost 

smtp.sendmail(my address, [my address], msg) 

smtp.close() 


Then you decide you want to send yourself a bigger chunk once a week. The simple, easy 
—and wrong—way to do it, illustrated in Listing 5-2, is called cut-'n'-paste 
programming. Note how most of the lines of the program are repeated in the if and else 
statements. (By the way, another change made in the program was the addition of 
sys.argv to get a command-line argument. That's not bad programming, we just wanted 
to point it out.) 


Listing 5-2: calendar.py 


IMOORE smitjolils, sys 


my address = 'me@example.com!' 
if sys.argv[1] == 'weekly': 
headers = [ 'Subject: Weekly calendar', 
"Biceme “ <> my ackcless , 
"TOs " sr wy ackieess 
] 
MrC1es = cepem ("my Calemelaie")) . 
msg = '\r\n'.join(headers) + ''.join (entries) 


Smee = SunicjOlalo). SIME ( Vigra Il Y )) 
smtp.sendmail(my_ address, [my address], msg) 
smtp.close() 


else: 
headers = [ 'Subject: Daily calendar', 
"Biceme " + my achklieess, 
"t@Os " sh imiy ackicess , 


] 
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MEles = Cpem ("iy Calemeeie")) . ceacilaimes () | 310] 
msc = V\e\imn! ,jjoslin (ineacias)) zp Yj @ilin (emicries)) 
Sime = Suniejelatls). SIMUPE ( Vigveiat il Y )) 
smtp.sendmail(my address, [my address], msg) 
smtp.close() 


The right way to upgrade your code avoids this repetition by taking the repeated code and 
turning it into a chunk that gets called by another part of the program. This is called 
refactoring the code. 


Tip If you remember algebra, code refactoring is similar to algebraic factoring; for 
example, breaking the number 12 into its factors 3 x 4. 


Refactoring a program works like this: 


1. 


Figure out the elements that are common to all the tasks the program 
performs. 


In this example, the common element is that both tasks (sending a daily calendar 
and sending a weekly calendar) involve sending an e-mail. 


Put the common elements into one or more functions. 
In this example, these tasks go into a function called send_calendar(). 


Put the elements that are different for different tasks into an if/else block 
(or perhaps a loop). 


In this example, the if/e1se block determines whether a daily or weekly 
calendar is being sent and how many lines of the calendar to include. It passes this 
information to the send_calendar() function as arguments. 


TECHNICAL Programming to the max 
STUFF 


Smart programmers have come up with a variety of methods for 
writing better programs. Many of those methods are combined under 
the umbrella of Extreme Programming(XP). (Some programmers 
have always wanted to be as cool as skateboarders. Maybe using a 
term like this helps. Or maybe not.) 


The idea behind XP is simple: If a programming method is a Good 
Idea, take it to the max! For example, one of the best ways of getting 
bugs out of a program is to have another programmer review the 
code. Well, why not have the second programmer review the code 
while you're writing it? That results in what "extreme programmers" 
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call pair programming. 


Similarly, if writing unit tests is good, maybe you should write the 
tests before you start writing code. That's called test-driven 
development. This Web page, "Test-Driven Development in Python" 
by Jason Diamond, has an example: 


http://www.onlamp.com/pub/a/python/2004/12/02/tdd_pyunit.html 
For more info about XP, see this Web site: 
http://www.extremeprogramming.org 


Not surprisingly, there is also a lot of hype about XP. Wikipedia to 
the rescue! Here is a balanced article about Extreme Programming: 


http://www.en.wikipedia.org/wiki/Extreme_Programming 


Why refactor? 


Although refactoring will take you longer the first time you change your program, it will 
save you time in the long run, and it will also make your program more generally useful 
and easier to read. The refactored version of the calendar.py program in Listing 5-1 is 
easy to extend—all you have to do is write two-line e1if blocks. Furthermore, if you 
want to change the send_calendar() functionality, you have to change only one chunk 
of code. But if you extended the program via the cut-'n'-paste method, your program 
would get longer very quickly, and if you made changes to the code that reads the 
calendar file and sends the e-mail, you'd have to make those changes multiple times. 


A general rule to use: Refactor if at least three lines of code are duplicated in your 
program. 


The refactored program looks like Listing 5-3. 


Listing 5-3: refactored _calendar.py 
IMOOrE smitjolils, sys 
my address = 'me@example.com!' 


# set up and send the email 
clair seme Calencdarc(acdzess, mum Emecies, Suloyeere) s 


headers = [| 'Subject: ' + subject, 
"From: ' + address, 
'To: ' + address, 


] 


a3 


MmEcles = sem (my Calendar") , eeacllimes (() | sinwm emcees | 
msc = Use \ia! . jj@alia (ineaicleies)) ae UY \ie Va! ce YY sj@alia (Givers) 
smtp = smtplib.SMTP('mail') 

smtp.sendmail(my_ address, [my address], msg) 
smtp.close() 


# parse the command-line argument 
if sys.argv[1] == 'weekly': 

send calendar(my address, 50, 'Weekly calendar") 
else: 


seme Callen Gin achiress, 10, Deadly Calemelere! ) 


Good Program Design Practices 


There are a few good practices to keep in mind while you're designing and writing your 
program. They'll help with documenting, debugging, and maintaining the program later. 


Naming names 


REMEMBER The names of values and functions should be self-explanatory to make 
your code easier to read. 


For example, if you want to write code that extracts a zip code from a customer entry in a 
database, you might write this: 


CUSromer Zid = GSE Zailjo (Pamdora) 


It is easier to figure out what the preceding line of code is doing than it is to figure out the 
following code: 


Z = getz (Pandora) 
Following conventions 


Python itself doesn't require you to write your code this way, but these conventions make 
programs easier for programmers (and you) to read. 


Tip The conventions are available on the python.org Web site under the name "PEP 8: 
Style Guide for Python Code." See them here: 


http://www.python.org/dev/peps/pep-0008 
The following conventions are the most critical guidelines. 
Naming rules 


Follow these guidelines when naming objects in your program: 
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e Give modules and packages short, lowercase names without underscores: 
e likethis 
TECHNICAL The name of a module or package is also the name of a file on 
STUFF the computer. Some computers don't handle long filenames or 
names with special characters. This practice makes it easier 
for users of other operating systems to use your program. 
e Start class names with a capital letter and use embedded caps: 
e LikeThis 
TECHNICAL This format is sometimes called "StudlyCaps". Because, well, 
STUFF geeks have to feel studly about something. 
e End the names of error-type exceptions with Error. 


This helps anyone reading or debugging your program to recognize exceptions 
right away. 


e Name functions and methods by using all lowercase characters. 
Underscores, like this, are OK. 


TECHNICAL Using different styles for names of different types of object 

STUFF helps distinguish functions and methods from modules, 
packages, and classes at a glance. The underscore character 
helps people read the name of the function or method if the 
name includes more than one word (for example, get _zip is 
slightly easier to read than get zip). 


e Use all uppercase characters for constants, LIKETHIS. 


Formatting rules 
To improve readability, follow these guidelines when formatting your code: 


¢ Indent each code block four spaces. Avoid tabs. 
e Continue long lines by surrounding them with (), [], or {}. 
e¢ Continue strings by ending the line with \. 
o Use \ to continue strings only when necessary. 
o Make sure to use an extra indent for continuation lines, like so: 


fo) parrot = "This parrot is no more! He has ceased to 
be! He's expired \ 
S and gone to meet his maker!" 


e¢ Put spaces before and after =. 


Warning But don't use spaces around = in default parameters of functions and 
methods. 


a 


Don't forget to comment! 


Include comments as you write the program. Comments should describe why the code is 
written the way it is and not just what it does. 


Chunks of code, such as functions, are usually preceded by comments that describe the 
whole chunk (not comments that go with each line). 


Tip If you've added comments to the program, collect them together when it's time to 
write 


e The main comment block at the beginning of the program. 


This block includes the type of file, the name of the programmer, the date of 
creation, the purpose of the file, and usage instructions. 


e The docstrings that are used to build the program's help file. 


Debugging Strategies 
Some strategies that programmers use to debug their code are as follows: 


e Three built-in functions — repr (), type (), and dir() — aid debugging by 
helping you figure out what values and types your program uses. 


See the following section, "Built-in functions," to find out about these three useful 
functions. 


e Traceback logs, print statements, and commenting out lines of code help you 
keep track of what's going on when your program runs. 
¢ Debugger tools let you step through your program one line, or one code block, 
at a time. See "Using a debugger" later in this chapter. 
The following sections describe ways to use these strategies effectively. 


Built-in functions 


Your three best friends in debugging are the built-in functions repr (), type (), and 
dir(). 


The repr () function gives you useful information about what Python really thinks is in a 


value; it is essential for debugging problems with strings because it shows the special 
escape characters in strings, like this (see Chapter 6 for a full explanation): 
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>>> mystring = "the knights who say 'Ni!'" 
>>> repr (mystring) 
"the knights who say \'Ni!\'"™! 


The type() and dir() functions give you additional information about a value. Even 
when the bug in your program is a logic error, the symptom is often a wrong value 
getting passed around. Finding out what that value is and where it came from usually 
allows you to figure out the bug. 


Here's a trivial case. Python's exception points you in the right direction for figuring out 
the error: 


SSeS PICS = wary ampoule ("MmeSe riees ”) 
Ihimeeue jorenkees i299) 
PrS City = ieee _alidjene ("Eimnieeie Gfbyeiinicalicyys  ) 
Enter quantity: 3 
>>> price * qty 
Traceback (most recent call last): 
wile “<siechias”, Iitine i, iin <mochiile> 
TypeError: can't multiply sequence by non-int of type 'str' 


The error message is clear enough, but it doesn't tell you what the "sequence" and the 
"non-int" are. What you might do next is check what price and qty are by using repr () 
and type (), as shown in the following example code. (So where's repr ()? In interactive 
mode, if you just type the name, Python automatically uses repr (). You need to actually 
type repr () only when you're debugging in scripts.) 


>>> price, type (price) 
(VILA. 99", eye Vsicie >) 
>>> qty, type (qty) 
('3", <tyiae “siti! >) 


Ah-ha! Both price and qty are strings. (That's because the raw input () function 
always returns a string.) In order to multiply them, you need to turn them into numbers. 
Here's how you do it: 


>>> price = float (price) 
>>> qty = int (qty) 

SSS joules * Ec 

33s JOVSISISIIOIY Oy) 


Now suppose the user enters something that can't be turned into a number. This will 
create another problem in the program: 


SSS jeieeS = Llloeir (en _ainjsmie ("Emre jorieee '))) 
Enter price: Good dog! 
Traceback (most recent call last): 

pile Y<siechias”, liine 1, iin <inochile> 
ValueError: invalid literal for float(): Good dog! 


Tip The way to handle this problem is to write code that catches the exception and 
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reminds the user to enter a number. We discuss how to do this in Chapter 15. 
Print statements and traceback logs 


When debugging, add print statements whenever you give a name to a value or 
manipulate something. You'll quickly discover at what point the name takes on the wrong 
value. 


Warning Don't suppress tracebacks (Python's information about errors or unusual 
conditions it encounters). If you want your program to continue after an error 
has occurred (instead of quitting), /og or print the traceback and go back later. 


Although print statements are useful in simple programs, wading through the output is 
difficult if you have a lot of them. It's also difficult to selectively turn print statements 
on and off. The logging module directs your debugging output to a file for later perusal. 
To control the amount of output, just change the logging level. 


Tip Chapter 17 shows how to use the logging module and also shows how to use the >> 
operator with print to redirect output to a file. 


Comments 
To use comments for debugging, add lines to the code that print the values of various 


names throughout the program. Then append a comment like # for debugging to these 
lines of code so you know to remove them later. 


Tip Using the comment character (#) at the beginning of a line of code disables that line. 
In some cases, disabling individual lines helps you track down which line of code is 
the source of a problem. 


Using a debugger 


A debugger lets you step through your code one line at a time or one code block at a time. 
It displays all the values Python keeps track of, showing exactly where values change. 


pdb 
The pdb module is a debugger that's built into Python. 


TECHNICAL The program we're debugging is called x. py, and it looks like this: 
STUFF 


print "This function creates a list." 
deft makelist(): 
ave) 
HOw a din wainge (il, 20) 3 
a.append (i) 


joieaince “eyojosinchiing', a, YS", ei 
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return a 
makelist () 


To use pdb from the command line, follow these steps: 


1. 


At the command prompt, type python -m pdb x .py, where x .py is the name 
of your module. 


You'll see something like this: The (1) on the first line indicates the first line of 
the program. The text following -> is the actual code of this line. (Pdb) is the 
prompt. 


> /Users/stef/x.py (1) <module> () 
=> joiilinic “MMA iblaciciem Cieeerces a llisie.” 
(Pdb) 


To run the current line of the program and go to the next line, type s. 


Each time you press s (which is short for "step"), you see the next line number 
and its code. In this example, you also see the result of the print statement on the 
first line. 


(Pdb) s 
Widals) i wIMNGEICin Crees a ilisie. 
> /Users/stef/x.py (2) <module> () 
= GIEIE imnalcell asic () 3 
(Pdb) 


Tip If you type s again at this point, the program jumps to line 8: 


(Pdb) s 

> /Users/stef/x.py (8) <module> () 
-> makelist () 

(Pdb) 


That's because the first time Python sees a function definition, it reads (defines) 
the whole function. This process is considered a single step. 


When you want to check that the program is storing the correct values, type 
p and the name to see its value. 


We got this result after we'd stepped through the program ten times: 


(Rolo) (5 a 
[tp 2) 


Tip To see available commands while you run pdb, type help. For help on a single 
command or topic, type help and the name of the command or topic, like this: 


(Pdb) help a 
a(rgs) 
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Print the arguments of the current function. 
The most useful pdb commands are: 


¢ s(step)—Run the current line of the program and go to the next line. 

e Ifthe line is a def statement for a function, define the function (if it hasn't already 
been defined), then go to the first line after the function. If the line is a function 
call, go to the first line inside the function. 

¢ n(next)—Run the current line of the program and go to the next line. But if the 
line is a function call, run the whole function and go to the next line after the 
function call. 

* c(cont)—Continue running the program until the next break point (see the 
following section, "Break points"). 

¢ x (return)—Run the program until the current function returns. 

¢ 1(1ist)—Display the source code around the current line. 

¢ a(args)—Display the current function's arguments. 

¢ p(print), pp (pretty-print)—Display the value of an expression. 

¢ q(quit)—Quit the debugger and the program. 


Break points 


If you know that most of your program is working but you suspect a problem in one part, 
you can set a break point where you think the problem is. In pdb, you do this by typing b 
and the line number you want to set a break point at: 


(clo) 19 3 
Breakpoint 1 at Users/stef/x.py:3 


In pdb, to run the program until it comes to the break point, type c (for "continue"). To 
see the values your program is storing at this point, type p and the name whose value you 
want to see. 


The following example sets a break point at line 5 of our program, runs the program until 
it reaches the break point, and examines the values at that point. Note that the break 
occurs before Python runs line 5. 


(Peis) Ip 5 

Breakpoint 2 at /Users/stef/x.py:5 
(PyClsy)) € 

aysjoeimching; 2 3 [Lil, 2] 


> /Users/stef/x.py(5)makelist () 
-> a.append(i) 


(2cls)) 7 a 
[ip Zl 
(Belo) je at 
3 
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Part II: Building Blocks 


Chapter List 


Chapter 6: So This String Walks into a Bar 
Chapter 7: Counting your Way to Fun and Profit 
Chapter 8: Processing Lists and Tuples 

Chapter 9: Diving into Dictionaries 


The 5" Wave By Rich Tennant 


"Ms. Gretsky, tell the employees they can have internet games on their computer again." 
In this part... 
You discover how to manipulate Python's many data types, from simple text strings, 


through all the different kinds of numbers you can work with (but don't worry, you won't 
have to remember high school algebra), to lists, tuples, dictionaries, and sets. 
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Chapter 6: So This String Walks into a 
Bar 


Overview 


In most computer languages, including Python, a string consists of text characters—just 
one character, a few lines, or a whole text file. In this chapter, you find out how strings 
work in Python, including 


e The various ways to use quotation marks to delimit strings 
¢ How to use index numbers and slices to get at parts of strings 
e Magic ways of formatting string output 


We also introduce you to the mysteries of Unicode. 


Stringing Them Along 


Inside Python, a string /iteral is surrounded by quotation marks, which distinguish it from 
other kinds of data, such as integers or names. You see the quotation marks when you 
type the name of a string in the interpreter. But when you print a string, the quotation 
marks don't appear. 


>>> y = '234! 

>>> y 

W273 Ae 

SS jOIeIuME 

234 

TECHNICAL The Python interpreter calls repr () to display the result of an 

STUFF expression—this display itself is usually a valid expression. The 
print statement uses str (), which displays a "prettier" version. 
(Chapter 2 has more details.) 


The following sections describe several ways to quote strings and how to use special 
characters inside strings (or how not to use them). 


Just the quotes, ma'am 


You can surround a string with single, double, or triple quotes. Which you use depends on 
the string contents and formatting you want. 
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One quote or two 


When you create a string, Python lets you to use either a pair of single quotes or a pair of 
double quotes. Both mean "a string is inside." This example shows a double-quoted string 
and a single-quoted string: 


>>> penguin = "on top of the television set" 
SSS wideie aie sis closing; = 


If your text includes single quotation marks or apostrophes, it's easiest to make it into a 
string by surrounding it with double quotes, like this: 


>>> knights = "We are the Knights Who Say 
SSS joie insoles 
We are the Knights Who Say 'Ni'! 


Likewise, if your text includes double quotes, it's easiest to make a string by surrounding 
it with single ones. 


Triple-scoop 


To make a string that prints exactly as you type it, use triple quotes, either single (""") or 
double ("""). Python prints the string with line breaks and spaces exactly as you enter 
them, as shown: 


>>> spam =" 
spam spam spam spam spam 
and spam 


Woy 


>>> print spam 


spam spam spam spam spam 
and spam 


Triple quotes are most commonly used to create docstrings. A docstring is a short 
description of a code block, such as a function. Python's help function automatically 
formats and returns the text of the docstring when you ask for help on the code block. For 
more about docstrings, see Chapter 11. 


Tip Triple quotes are also useful for surrounding text that has both single and double 
quotation marks. 


Ways to escape 


To tell Python to give special treatment to a character in a string literal, precede it with a 
backslash character (\), also called the escape character. 
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Table 6-1 lists some of the characters that do special things when preceded by a 
backslash. (These are commonly called escape codes.) 


Table 6-1: Some Python Escape Codes 
=™>Open table as spreadsheet 


Character Meaning 
\ as last character on a line Text on the next line goes with the text on this line. 
\\ Backslash 


A" Single quote 

a Double quote 

\e Escape key 

\n Linefeed 

\t Tab 

\O nn Octal character (mn is a two-digit number) 

\x nn Hexadecimal character (nn is a two-digit number) 


One way to use the backslash is to tell Python to treat a quotation mark as a regular 
character rather than as a "string starter or ender," as in this example: 


>>> whiteRabbit = 'Where\'s my pocket-watch?' 
>>> print whiteRabbit 
Where's my pocket-watch? 


Tip It's easier to read strings that don't have escape characters within the code, so avoid 
them if you can. If you have text with both single and double quotes, consider using 
triple quotes to designate it as a string. 


To tell Python that a backslash is part of your string and not an escape character, precede 
it with a second backslash, as shown: 


>>> path = "C:\\Applications" 
SSS jOwiinie jOeiela 
C:\Applications 


Raw strings 
To tell Python not to look for any escape codes in a string, specify the string as a raw 
string. You might want to do this when handling Windows pathnames, which include the 


backslash character. 


Tip Raw strings also simplify regular expression searches, which also use backslashes as 
special characters. See Chapter 18. 


To specify a raw string, type r before the first quotation mark, like this: 
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SSS jetta = c"Cs \VAoplicaicions” 

>>> print path 

C:\Applications 

Warning There's one case in which raw strings aren't completely raw. It's an error to end 
a raw string with a backslash because Python thinks you're using the backslash 
to escape the quote mark that ends the string. To type a string that ends with a 
backslash, you must use a regular string. You can add it to the raw string simply 
by typing it on the same line. If you type several strings on the same line, you 


don't need to use a concatenation operator: 
SSS joint ws \Wlaica\myorcocems! Y\\! 
C:\data\myprograms \ 


Being wordy 
There are several ways to create a string that will span more than one line. 


e To write a string that will print exact as you entered it, surround the string with 
triple-quotes (""" or """). 
¢ To write text that will print on several lines, add the linefeed escape code \n 


where you want a new line to start, like this: 
e >>> cheese = "cheeses available: \nsavoyard\nboursin" 


e >>> print cheese 

e cheeses available: 

e §6savoyard 

e boursin 

e To turn two strings on two lines into a single string, surround them with 
parentheses. We recommend this method for entering long strings because it's 
easy to read. For example: 

OO Se se = (Mareuliloy 

cog §y@realiel”) 

O SSS joerc xx 

e = helloworld 

e To create a long string, type a backslash character (\) at the end of a line. What 
you type after you press Return counts as part of the same line. 

e >>> longline = "this is a very long line a very long \ 


eng Maliges ei wesneyy Ieyave), lLatiaves\” 
e >>> print longline 
e this is a very long line a very long line a very long line 
Warning The end-of-line backslash doesn't work if you add spaces after the 
backslash. 


How a string looks inside Python 
To see how Python internally represents a string, type the name of the string and press 


Return. In this example, Python represents a newline character with the escape character 
\n: 


>>> modernlife = "as i was surfing on the air\n\ 


85 


i saw a spam that wasn't 
>>> modernlife 
"as i was surfing on the air\ni saw a spam that 


When you ask Python to print a string, it interprets the newline character. So the 
modernlife string is printed on two lines, like this: 


>>> print modernlife 
as i was surfing on the air 
i saw a spam that wasn't there 


"Please repeat": String operators 


Several of Python's operators work with strings. This section covers the following 
operators: 


¢ Concatenation (+): Sticks strings together. 
¢ Repeating (*): Copies a string several times. 
e Size testing (<, >): Determines which of two strings is bigger or smaller. 


Combine and repeat 


To concatenate strings, type the first string, the + symbol, and the second string, like so: 


>>> hello = "hello" 
>>> world = "world" 
>>> hello + world 
"helloworld' 


Tip If you're working with string literals rather than strings that have names, you don't 
have to use the + operator to stick them together. You can just put them next to each 


other on a line, like so: 
SSS Mic Choy! wien €) iclmilioe; 7 Malic aie geile COE icloeie Syyatjave;” 
Witic (choim! wien ei iclmaiiove; akic akic @ialiaVic Gio ‘clavate jsyifaliave;™ 


To repeat a string, type the string, the * symbol, and the number of times you want to 
repeat the string, as shown here: 


SSS Wie clon! wreein €) ‘eldsline; Y ~ 3 
"Tt don' mean a thing It don't mean a thing It don' mean a thing" 


To concatenate and repeat at the same time, combine the operations, like so: 


>>> toasterl = "would you like some toast? " 
>>> toaster2 = "or how about muffins??? \n" 

SSS aiMiNCwaling) easier = (tOasineiell + CoOesieeez)* Ss 
>>> print annoying toaster 


would you like some toast? or how about muffins??? 
would you like some toast? or how about muffins??? 
would you like some toast? or how about muffins??? 
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The augmented assignment operators += and +* also work with strings. These operators 
let you perform an operation and give the result to a name at the same time. They are very 
useful in loops because they make the assignment statement easy to read. 


REMEMBER The + and * operators work differently on strings than they do on numbers. 
You can't add a string and a number together—you must first convert the 
string to a number (for example, by using the int () or float () 
functions). And if you try to use * on a string and an integer, you get a 


repeated string, not multiplication, like this: 
Pee IB) eG) 
UDSIIOS ! 
SSS aime (45")) ar 6 
ileal 


Mine's bigger than yours! Comparing strings 

When you're working with numbers, it's usually obvious what the smallest and largest 

numbers are. But it's less clear what smallest and largest mean when you're talking about 

characters. 

TECHNICAL A character's ASCII or Unicode value (a number) determines the 

STUFF character's order. So it would also be correct to talk about one 
character coming "before" (smaller) or "after" (larger) another 
character in a numbered list of characters. The following list 


describes how Python orders (assigns sizes to) characters: 


e Letters at the beginning of the alphabet are smaller than 
letters at the end. 


Ais smaller than z. 


e¢ Capital letters come before (are smaller than) lowercase 
letters. 


y and z are both smaller than a. 
e Digits are smaller than letters. 
5 is smaller than a. 
e Most punctuation marks come before digits or letters. 
! is smaller than 5. The exceptions are curly braces ({ }), the 
pipe character (|), and the tilde (~), which are larger than 


letters. 


To determine which of two strings is larger, use the < and > operators. 
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SS WAY < VEY 


True 
Pees ey a Mie 
True 
Bee Vey Se Vial 
True 


Looping through strings—You can, but why would you want to? 


Because a string is a sequence data type, you can use it in a loop; that is, you can 
iterateover it. But there are very few reasons to do this. In fact, code that does this 
probably has a bug. Python doesn't have a built-in error or warning message for iterating 
over a string, but external debugging tools such as PyChecker often flag it as a bug. 


The PyChecker tool looks for common problems in Python code. It doesn't come with 
Python, but you can download it from http://www.pychecker.sourceforge.net/. 


A few more methods for working with strings 


You use a string method to perform actions on a string. To use a string method, type the 
name of your string, a dot, and the method. Then inside parentheses, pass any parameters 
that the method needs. In this example, the parameter is the string you want to count the 
occurrences of: 


>>> mystring = "mississippi" 
>>> mystring.count('s') 
4 


Testing the content of strings 


Python lets you test whether strings have certain content. These tests return True or 
False (Boolean results). 


To test whether text is anywhere in a string, type the text you're searching for in quotation 
marks, followed by in and the name of the string. This example gives a name to a string 
and tests for the string 'goose' inside the string. 


>>> mystring = 'the quick brown fox' 
>>> 'goose' in mystring 
Balse 


To test whether text is not in the string, use not in rather than in. 


Tip Python 2.2 and earlier don't have the in keyword. In those versions, you can test for 
text in a string by using the find() or count () method. 
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To test whether text is at the end of a string, use the endswith() method, as shown here. 
(The startswith() method works the same way but finds text at the beginning of a 
string.) 


>>> mystring.endswith('fox') 
ilsranene) 


Converting data to a string 


To turn data other than a string into a string, use the built-in function str (), as in this 
example, which converts an integer to a string: 


SSS sei (2345) 
UD BANS) 


Sorting a list of strings 


To sort a list of strings, use the sort () method of lists. Type the name of the list, a dot, 
and sort (), like this: 


SSS ulyllast 

SS mlyilis 

>>> mylist 

[Vitoxiemoic",, Virzingo’, Vweisikey' J 

Tip Because small letters are "bigger" than (come after) capital letters, if you need to 
alphabetize some text, you should convert it to all lowercase first. To convert a string 
to lowercase, type the name of the string, a dot, lower, and parentheses, as shown: 


— whiskey tang @its . hOxb iO || 
5 Sone. (()) 


(ar (eu teu 


SS Ok = Malas, IS) JA SaURIING 
>>> x.lower() 
Vielmale; als él 


Finding out more about string methods 
Tip For more information on string methods, type help (str) at the Python prompt. 


The string methods work with both regular strings and Unicode strings (see the upcoming 
section, "Unraveling Unicode"). They include the following types of actions: 


e Case manipulation (capitalize, upper, lower, swapcase, title) 

¢ Counting (count—number of times substring is in string) 

e Manipulating text encoding (encode, decode) 

e Search and replace (find, replace, rfind, index, rindex, translate) 

e Tests that return Boolean (True or False) values (startswith, endswith, 
isalnum, isalpha, isdigit, islower, isspace, istitle, isupper) 

e Joining and splitting (join, partition, rpartition, split, splitlines) 

° Formatting (center, ljust, lstrip, rstring, rjust, strip, zfill, 
expandtabs) 
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Tip The Cheat Sheet attached to the front of this book briefly describes how most of the 
above methods work. 


The string module versus the str type 


Early versions of Python implemented string operations as functions in the string 
module. In Python 2.0, most of those functions were duplicated as methods of the str 
type. The string module still works, but use of the string module usually indicates an 
old program (except for string.maketrans (), for which there is no equivalent method). 


Cat's Cradle: Indexing and Slicing 


Strings (and other sequential data types) are divided into pieces called elements that are 
stored in order (in a sequence). In a string, each character is one element. You manipulate 
the individual elements by using operations called indexing and slicing. 


An index number specifies the location of a particular element, and a range of index 
numbers specifies a slice of several elements. 


This section describes the basics of indexing and slicing. You also find out some shortcuts 
and discover how to use slices to make copies of strings. 


REMEMBER Strings can't be changed. But you can use slicing to make a new string 
based on part of an existing string. 


Basic syntax 


You use indexing to find an element of a sequence object (such as a character in a string) 
based on the element's index number. You use slicing to find a range of elements. 


Indexing 


To find an item corresponding to a particular index number, type the name of your string 
(or other sequence object) followed by an index number in brackets. Note that the first 
index number is 0. This example finds the third character, which has the index number 2: 


>>> mystring = "truly, madly, deeply" 
SSS mlySsicicsling || 2 || 
aay 


Slicing 
To find items corresponding to a particular slice, you use the name of the string and 


brackets, as above. Inside the brackets, you enter a s/ice expression: the beginning of the 
slice, a colon, and the end of the slice. The end of the slice means "up to but not 
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including" this element. In the example below, five elements are returned (index numbers 
7, 8, 9,10, and 11). 


Here's an example: 


SSS amysiciealing |[ 7 312] 
'madly' 


A step (sometimes called a stride) skips over some items in the sequence. To specify a 
step, type a second colon and the step number, like so: 


SLl> mySiticiline = YIZS456 7/89" 
Say my SiewaLing || 0332] 
VLS5 TQ? 


Steps aren't supported in versions of Python earlier than 2.3. 
Figuring out the tricks 


Following are some shortcuts you can use with slicing and indexing syntax. The 
examples all use this string: 


alpha = 'abcde' 


e The beginnings and ends of slices are like notebook dividers—they sit between 
the elements. 

e Ifyou are counting from left to right when indexing or slicing, the first index 
number is 0, not 1, as shown: 


e =6>>> alpha = '‘abcde' 
e =6>>> alpha[0] 

e Weyl 

O SS> aljsine. [0s 3] 

OQ Failexe:" 


Tip When slicing, the first index defaults to 0. If you leave out the first index, 
Python uses the first item in the sequence, like this: 


SSS ailljsida || 32)] 
Valo! 


e When slicing, the last index defaults to "the length of the sequence." You can 
access the last item when slicing by leaving out the number after the colon, like 


this: 
O SSeS alljome|ils] 
e §63©'bede'! 


Tip If you use a number larger than the size of the sequence, "the length of the 
sequence" replaces it. 
e = 6>>> alpha[3:10] 
oO NieieY 
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e¢ However, this doesn't work with indexing. If you use a number larger than the 
size of the sequence, Python raises an IndexError. 

e >>> alpha[10] 

O Wraceoach (mos: weceme calli lasic) s 

e Mile “<sieciims”!, lime i, sim <mochile> 


e iIndexError: string index out of range 

e Ifyou are slicing, and the second index number is smaller than the first, you get 
an empty sequence (for example, an empty string). 

O  SS> gullet || 332] 

Po ea 

e You can also count from right to left when indexing or slicing. (This is called 
negative indexing.) When indexing, the first index number, from the right, is -1, 
the next is -2, and so on—but the leftmost index is still 0. 

Oo SS> alpina |=] 

e y e U 

e Ifyou are making a slice, however, -1 points before the last element. To get at the 
last element, you need to use ":]": 


O = Se> alpina =ss—l] 

oO Neel 

e = =6>>> alpha[-3:] 

O Viele 

e To get the whole sequence as a slice, use [:] (which makes a copy). 

OQ SS ailljoiaa || 2] 

OV eilsyercle" 

e Specifying a negative step when slicing is one way to reverse a string. 
OO SSS Vinee" [ls 38—ib]| 

° "olleh' 


Figure 6-1 shows the relationship of index numbers and elements in a string or other 
sequence object. 


End of sequence 


Segineing of sequence 
Figure 6-1: Relationship of indexes, slice indexes, and elements in a sequence object. 
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Changing the immutable string 


Strings are immutable (cannot be changed). But you can use slicing and the + operator to 
make a new string, using some of the content from an existing string. To make a new 
string by slicing, follow this example, which uses slicing to take all but the first four 
characters of the candy string and concatenates it to the string 'milk': 


>>> candy = 'dark chocolate bars' 
SSS ileiss@eiachy = Vialilic! zp Geiachy [4s | 
>>> print morecandy 
milk chocolate bars 


Interpolating Between the Lines 


To format the output of strings, you can use the string methods mentioned in "Finding out 
more about string methods," earlier in this chapter. But there is another, more common 
way to format strings: the operator. When used with strings, this operator is sometimes 
called the interpolation operator. 


When we use the word formatting, we aren't referring to word processor—type formatting 
—changing fonts, colors, styles, and the like. We mean specifying what happens when a 
value is inserted into a string. For example, we might specify how many digits to display 
to the right of a decimal point. 
The interpolation operator also lets you insert data in specific places in a string. This 
means you can calculate your data in one part of your program and print it or save it to a 
file in another part. 
Using the interpolation operator 
You can use the % operator in several ways: 

e To include a single data item in a string 


e To include multiple data items in a string by using a tuple 
e To include multiple data items in a string by using a dictionary as a mapping key 


Meanings of the formatting codes 


A formatting code consists of the + character followed by a character called a conversion 
type. Each character stands for a particular type of data to be included in a string. 
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The %s formatting code (to indicate string data) is the most popular code, because it 
works on almost all types of data—if you use something other than a string as your data, 
Python converts it to a string automatically. 


Here are some characters that stand for other kinds of data: 


%c: Single character (works with an integer or single-character string) 

3d, i: Signed integer decimal 

%f, F: Floating point decimal 

sr: A repr () representation of the data 

%%: Percent sign; used with other codes as an escape character for a real percent 
sign, as in the example below: 

SS> VSO OF inotlauline a6. Sita Si” % © 

U5OS Oi iaenelading; als Siewili 0)" 


Optional modifiers of the formatting code 


You can add several optional modifiers to codes. You must type the formatting 
modifiers and code without spaces and use this order: 


1. 
pas 


6. 


The % operator 

A mapping key in parentheses (described in "Formatting with a dictionary," later 
in this chapter). 

Conversion flags, so-called because they affect how some kinds of data are 
converted when they're included in the string. They mostly specify how to handle 
+ and -- signs. 


To find out more, see the FORMATTING topic in Python's help utility by typing 
help ("FORMATTING"). 


The minimum amount of space to allow for the data (minimum field width)—a 
positive integer. 

How many digits to include to the right of the decimal point of a decimal number 
(precision)—a dot (.) followed by the precision value. 

The formatting code 


The first and last of the preceding steps are required; the rest are optional. 


This example prints pi to a precision of 10 decimal places: 


>>> import math 
>>> "The value of pi is about %.10f" % math.pi 


"The value of pi is about 3.1415926536' 
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Formatting with one data item 


If you have a single data item (for example, mystring = "swallow") and you want to 
include it in a string, follow these steps to write the code to do it: 


BW NOR 


Type your string up to the point where you want your data to go, like this: 
>>>> "An African or European 


On the same line, type % followed by the formatting code; for example, 
>>> "An African or European %s 


In the above example, %s indicates that you will insert string data. 


On the same line, type the rest of your string, a space, another %, a space, and 
the data item that you want to include. 


Your final line might look something like this: 


fo) 


>>> "An African or European %s?" % mystring 
"An African or European swallow?' 


Formatting with multiple data items in a tuple 


To include multiple pieces of data in a string, you use multiple formatting codes and pass 
the arguments in a tuple (that is, inside parentheses, with the values separated by 
commas; see Chapters 3 and 8 for a refresher on tuples). 


This section works with line 4 of the following code, which calculates and prints a table 


of powers: 
joneatione “Vey Soe ila ey (Uae Versa 25 Verensicosel) 
pouestione, MSM ae Ze 
IE@Ie Xk sin swesMNGS (A, G)) 3 
joreriog “S2Cb Goch Si2el” S Gee KSS2, see) 


To insert multiple data items into a string, follow these steps: 


1. 


Type print, followed by a formatting code for each item you want to include, 
like so: 


, peace "S2S BS Sl2s" 


The numbers in this formatting code specify minimum field widths. 


On the same line, type a space, 3, a space, and a tuple that includes one data 
item for each formatting code in the string, as shown here: 


5 jetealiome Wows: Soe Biles fy) (Ue Vets ail 2 Msi hore) 


The code prints the following results: 
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1 il il 
Z 4 4 
3 8) 2 
4 16 256 
5 ZS S129 


It's easy to forget to use tuples when formatting with multiple data items. If you 
remember to do this consistently, you'll be ahead in the debugging game. Even 
experienced Python programmers make this mistake, so if you see an error message like 
this, you're in good company: 


So Ve esl Viele 2 Valencia! 
Traceback (most recent call last): 
Pile “Wesiechiias”, lime i, im <mochwle> 


TypeError: not enough arguments for format string 


Here's the right way to do it: 


Ss> Ys SS & (Mae! Vibeiceln) 
Melane lkewecila" 


Formatting with a dictionary 

You can get data out of a dictionary and include it in your string by using the % operator. 
You use a mapping key to stand for the value to be used from the dictionary. Follow these 
steps to create a line of code that gets the value of the mathematical constant e from a 


dictionary and prints it as part of a string. 


1. Type your string up to the point where you want to include your data, like so: 


2. "The value of e is approximately 

3. Type % and (in parentheses) the dictionary key for the value you want to 
include. 

4. "The value of e is approximately %(e) 

5. Type the formatting code (so that the whole chunk of code looks something 
like this): 

6. "The value of e is approximately %(e) £ 


7. Type the rest of your string, a space, the % operator, a space, and the name of 


the dictionary. 
8. "The value of e is approximately %(e)f " % mydict 


Here's the line of code in context: 


SSS mei Conse = {eis S141592, Vets 2. 7is2e2 } 
SSS “ne yellue Of © 18 dppwexamarely B(E)i" % mail Comsic 
"The value of e is approximately 2.718282' 
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A formatting example using string methods 


You can also use string methods to write code that prints a table of powers, similar to the 
one in the previous section, "Formatting with multiple data items in a tuple." The 

rjust () method right-justifies the string within an area the size of the number of spaces 
specified in its argument. 


Note that this code has more lines than the code written with the % operator and tuples. If 
you're concerned about reducing the size of your programs, the % operator might be the 
way to go. 


x = 'x! 
vy = Vy5¢%*D! 
Time WM eXGN AEX 
FONCTION OX, Were (4) , wate wise (6) 
poneaialic. “MSU es: idl 
OIE i Lin eSinee (il, 6) s 
igiay = Sieie (ins 2) 
iniaiay = <Sieie (iain) 


joanne id, ini ejwisie (4), iminial, e7usic (6) 


Here's what it prints: 


be ere ROSES 
il il i 
2 4 4 
3 ) ZY 
4 16 26 
5 DS) S125) 


Unraveling Unicode 


So far, you have been using strings that work with English text (and some Western 
languages, like French and German). But what if you need to use Russian or Japanese? 
Python 2.0 introduced a special string type called Unicode strings that let you manipulate 
strings in any language or alphabet. 


Unicode strings work mostly the same way as regular strings. Here are a few of their 
unique features: 


e Unicode strings in Python are specified with a lowercase u. 
e In most installations of Python, a four-digit hexadecimal number represents each 


Unicode character. 


Regular (ASCII) characters use a two-digit hexadecimal number. 
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Creating a Unicode string 


To create a Unicode string, precede the string with u, like so: 


PES 


= u"Fruit bats pollinate the flowers of many fruits." 


To create a Unicode string that uses a non-ASCII Unicode character, follow these steps: 


1. 
oe 


a 


Type u to specify a Unicode string and then start typing the string. 
u'Libert 


To specify a character that you want to represent by using Unicode, type \u 
and the character's hexadecimal value, like so: 


. u'Libert \u00e9 


Type the rest of the string and the ending quotation mark: 


. u'Libert\u00e9 ' 


The value 00e9 stands for the é character. 


TECHNICAL What is Unicode and why do we care about it? 
STUFF 


When people started teaching computers how to talk, the computer 
geeks came up with a system called ASCII (American Standard Code 
for Information Interchange). ASCII characters are encoded by using 
7 bits of information, which allows for only 128 (0-127) characters. 
That wasn't enough for non-English languages, accented characters, 
and other common symbols. 


Enter Unicode. Unicode will eventually be able to encode every 
character set, including those for non-European languages, alphabets 
used by scholars, and mathematical and linguistic symbols. See 
http://www.unicode.org to find out more. 


TECHNICAL Location, location, location 
STUFF 


How Python stores and displays your Unicode string depends on 
your locale, which is Python's idea of what language and character 
set you're using. The locale also determines the default character 
encoding that Python uses. The encoded string must match that 
encoding. 


Here's how to see your default locale and encoding: 


>>> import locale 
>>> locale.getdefaultlocale() 
[sin US’, “wie | 
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Locales are tricky because all (repeat, a//) character input and output 
must go through the encode/decode process. When you print a 
Unicode string, Python automatically tries to do this conversion so 
that you see the characters you expect. Here's an example of this 
conversion in action: 


>>> print u'Libert\u00e9' 
Liberté 


But you'll see an error if you try printing a Unicode string that 
contains characters that don't exist in your current locale. For 
example, if you try to print a Hebrew character with a locale that 
doesn't support any Hebrew encodings, you'll see this error: 


>>> print u'\u05d0" # Hebrew aleph character 
Traceback (most recent call last): 
Pile “<siecim>”", lime 1, im <mociule> 
__ builtin .Umicocesimeodeiewocs YaAscii" cocse 
can't encode character u'\u05d0' in 
position 0: ordinal not in range (128) 


Note that many modern operating systems support multiple 
encodings, so you might not see this error if you try it yourself. 


A twisty maze of codes 


Unicode lets you process text by using only one code. But Unicode can't be used directly 
with character input and output, so you need to encode output and decode input. 


Beginners tend to think of Unicode as another encoding of the character set they are used 
to using. Thus, they think that they need to encode "regular" text into Unicode and 
decode Unicode into regular text. That's backward. Here's the correct way to think about 
it: 


When you start using Unicode, whatever regular character set you're using is a part of 
Unicode. Therefore, you need to encode Unicode characters into your regular character 
set, and decode your regular character set into Unicode. 


Encoding, decoding, and other Unicode methods 


All the regular string methods also work with Unicode strings. A couple of methods apply 
especially to Unicode: encode () and decode (). 


The list of standard encodings is available at 
http://www.python.org/dev/doc/devel/lib/standard-encodings. html. 
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Converting Unicode to an encoded string 


To convert a Unicode string to an encoding, such as UTF-8 or Latin-1, you use the 
encode () method. Follow these steps: 


1. Create a Unicode string, like so: 
2. y = u"Libert\u00e9, \u00e9galit\u00e9, fraternit\u00e9" 


3. Type the name of the string, a dot, encode, and (in parentheses) the encoding 


you want to convert to. 
4. y.encode ('utf-8') 


This example shows the difference between the encoded and printed strings: 


>>> gq = y-encode('ut£-8') 

SOS Cj 

"Libert\xe3\xa9, \xc3\xa9gqalit\xe3\xa9, fraternit\xe3\xa9" 

SSS foie EF 

Liberté, égalité, fraternité 

Tip Python's file objects and stringIo objects don't support raw Unicode; the usual 
workaround is to encode Unicode text as UTF-8 before saving it to a file or StringlO 
object. 


Decoding a string into Unicode 


The decode () method, new in Python 2.2, assumes that a string is in the specified 
encoding and turns it into Unicode. To decode a string into Unicode, follow these steps: 


1. Create an encoded string, like so (this string is encoded using UTF-8): 
2, GF = Vibwlosicie \sces) \\xeaiG)! 


3. Type the name of the string, a dot, decode, and (in parentheses) the encoding 
that the string currently uses. 
4. q.decode ('utf-8') 


The Midas touch—Unicode versus str 


In Python, Unicode behaves a bit like Midas, the character in Greek legend who turned 
everything he touched (including his daughter) into gold. When you use a Unicode string 
and a regular string together, the results are always Unicode. 


>>> myunicodestring = u"Midas is coming! " 
>>> mystring = "Don't touch me, Midas!" 
>>> myunicodestring + mystring 

u"Midas is coming! Don't touch me, 

Midas!" 


This example shows the difference between the encoded and decoded strings. 


e Encoded string 
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e >>> 'Libert\xc3\xa9' 
¢ Decoded (Unicode) string 


® u'!Libert\xe9" 


\xe9 is Python's shorthand way of representing Unicode character \u00e9, which 
prints as é. 


Both strings look the same when printed: 


SSS OLA Cj 

Liberté 

>>> print q.decode('utf-8') 
Liberté 
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Chapter 7: Counting your Way to Fun 
and Profit 


In Python, numbers come in several different data types. Chapter 3 introduces the number 
types and their operators. This chapter gives you more information on integers, binary 
floating point numbers, and complex numbers, and provides guidance for using Python's 
augmented assignment operators. It also introduces Python's built-in math modules, 
including the decimal module for doing true decimal arithmetic, and the random module 
for generating random numbers. 


Integrating Integers 


Integers are positive and negative whole numbers (numbers without a value to the right of 
the decimal point). We give you a brief overview of integers in Chapter 3; the following 
sections describe some finer points about Python's integers. 


Why Python has two kinds of integers 
Because of how computers store numbers in memory, Python uses two kinds of integers: 


e Regular integers 
e Long integers 


In the recent past, most computers used 32 bits of memory to store an integer. But 
integers larger than a certain value (2,147,483,647, to be exact) take up more than 32 bits 
of memory. So Python uses a different mechanism to store them. Python represents long 
integers with the suffix L. 


TECHNICAL In future versions of Python, all integers will be handled the same 

STUFF way. For now, if you're writing programs that manipulate very large 
integers, each version of Python handles integers a bit differently. 
Your results may vary depending on the version. If you're using 
Python 2.2 or later, regular integers automatically convert to long 
integers, so you can mostly ignore the issue. 


Avoiding unexpected results with integer division 
If you enter only integers when you do arithmetic, Python returns the results in integers. 
This can give unexpected results if you're doing division with the / operator—you get 


only the integer part of the result. 


TECHNICAL In math terms, the / operator performs floor division, not true 
STUFF division. 
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For example, your math teacher would not let you get away with this result: 


SSS is 7 3 


4 


Tip If you want true division, use one of the following solutions: 


Use the decimal module, like this: 

>>> from decimal import Decimal 

>>> Decimal(13) / Decimal (3) 

Iyeverslsenst ll (AL, SiS) SiS}Sh 3S} 3) Si 33S} S33} S)S} 3} SS} StS) 5} 333) S}S}.3)5)')) 


The decimal module was introduced in Python 2.4. See the section, "Turning 
Python into a Calculator with decimal," later in this chapter, for details. 


Make sure that at least one of the numbers in a calculation is a floating point 
(decimal) number, like this: 

SSS 13.0 / 3 

A SIS ISITIIS IIS SS 


Use the division feature from the _ future _ module, like this: 


SSS Liem tuetiS  amjeoire Chivalsilen 
SSS 13 / & 
A SISIISSIS IIIS: 


Using future ___ activates code that will become the default in Python 3.0; 
in this case, it activates true division for the / operator. For integer division, 
use the // operator. 


Floating Along 


The standard format for computers to store real numbers is called binary floating point, 
which is different from the format most people are used to. For example, if you type 0.1 
into Python's interpreter, you see that Python's internal representation of the number is 
different from what you typed: 


Se Ooi 

0.10000000000000001 

TECHNICAL This happens because a computer can't accurately represent some 
STUFF decimal numbers as floats. For more information than you really 


need to know about why, see Appendix B of the Python Tutorial at 
http://www.python.org. (For detail nuts only: The decimal numbers 
that can be represented accurately have a fractional component 
containing only sums of powers of two.) 
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Automatic conversion 


Python automatically converts different types of numbers so that you (usually) don't need 
to think about the compatibility of different number types. For example, if you do a math 
operation with both integers and floats, Python converts the integers to floats and gives 
the result as a float. 


Formatting floats 


If you don't want to look at weird numbers with 16 decimal places, use the str () 
function to display the number as a string. 


Se a. Ak 

A ODI QOVI IOI I INS 

SSS sicie (4. 1) 

Leg els 

Warning You might be tempted to use the round () function to handle floating point 
numbers. But round() requires you to specify a precision (the number of digits 
to the right of the decimal point) and defaults to a precision of 0 (which 
removes ail the digits to the right of the decimal point). This probably isn't what 
you want if you're working with decimal numbers! 


Size limits on floats 


Python's binary floating point numbers have a finite range. Python stores binary floats 
using at most 17 decimal digits of accuracy. In addition, the largest float you can store is 
1.79769_10 *°8 (which Python represents as 1.79769e+308). If you try to calculate or 
store a larger number, you may get a result such as this: 


>>> x = 2,.79769e+308 
SS 
aLioie 


Long integers can be larger than floats, so you might get an "overflow error" or incorrect 
results when using large integers with binary floats or with the float () function (which 
converts numbers to binary floats). The following code attempts to convert 10 !°° toa 
float. Because the resulting number is too big for Python and the computer to handle, it 
raises an error: 


SSS illosie (1O1 OOO) 

Traceback (most recent call last): 

Pile Wesieciias”, lime i, am <mechile> 

OverflowError: long int too large to convert to float 


The following code also gives an incorrect result because of the 17-digit size limit on 
floats. (A number doesn't equal itself plus 1, no matter how big it is!) 


See 10.0 *~ LOO == (10.0 ** 100) + 2 
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True 


Imagining Complex Numbers 


Complex numbers have a real and imaginary part, both of which are floats. 


TECHNICAL A complex numeric literal is the sum of a real part and an imaginary 
STUFF part. 


In Python, to specify an imaginary number, or the imaginary part of a complex number, 
append j to the number. 


To separate the real or imaginary part of a complex number, type x. real for the real part 
or x.imag for the imaginary part, like this: 


SSS x = (S67) 
SSS weal 
350 
>>> x.imag 
6.0 


Augmented assignment 


Use augmented assignment operators to perform actions such as addition, concatenation, 
subtraction, and multiplication on an object and store the result using the same name. (An 
"augmented" assignment statement has an extra feature—an addition or concatenation 
operation—built in.) Here's why: 


e Augmented assignment statements are easier to debug. 


The statement counter +=1 clearly uses only one object. If you use the statement 
counter = counter + 1, you must read the statement carefully to make sure the 
two objects are the same. If you're trying to debug someone else's thousand-line 
program, you'll appreciate the simplicity! 


e Augmented assignment statements are more efficient. 


These statements evaluate an object only once and change mutable objects in 
place; regular assignment evaluates an object twice and creates a new copy in the 
process. This can slow performance with large objects. 


These are the most commonly used augmented assignment operators: 


+= (addition or concatenation): 
counter += 1 

-= (subtraction): 
counter -= 1 

S= (Milica licaAiciem)) ¢ 
counter *= 1 
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Using Math Modules 


Along with the arithmetic operators (covered in Chapter 3) and the basic math functions 
ofthe builtin module (introduced in Chapter 2) that's always available when you 
use Python, Python has several modules that provide math functions. 


Tip To import any of the math modules, type import followed by the name of the 
module. Use the following list to guide you to the module you need: 


e Real numbers: The math module has most of the functions you need for real 
numbers. To see all the functions, type this into the interpreter: 

e >>> import math 

e = =6>>> help (math) 

¢ Complex numbers: The cmath module, for complex numbers, has most of 
the same functions as the math module but returns results as a complex 
number type (for example, 3+6j—see "Imagining Complex Numbers," earlier 
in this chapter). 


The cmath module also returns values for calculations such as the square root 
of a negative number. (The math module treats such calculations as errors.) 


Warning If you're working with both complex and real numbers, use the 
module name (such as cmath.sqrt() Or math.sqrt ()) when 
calling a function. Many of the two modules' functions have the 
same names but work differently. The module name helps you keep 
track of which function you're using. 

Decimal numbers: The decimal module lets you work with decimal 
numbers rather than binary floating point numbers. 


Tip Decimal numbers give more accurate results for many calculations. See 
"Turning Python into a Calculator with decimal," later in this chapter. 
Random numbers: The random module includes tools for such tasks as 
generating random numbers and choosing randomly from several elements. 
For a fun introduction to the tricks up the sleeve of the random module, see 
Chapter 21. 


Turning Python into a Calculator with decimal 
The decimal module, new in Python 2.4, supports decimal floating point arithmetic. 
If you're writing applications that require control over precision and rounding (such as 


financial software or statistics modules), this module is for you. Its results also match 
calculations done by hand (math educators take note!). 
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The decimal module gives more accurate results than binary floating point math for 
these types of calculations: Calculating a percentage, modulo calculations (which return 
the fractional result of division), and equality tests. 


Like other numbers, decimal numbers are immutable. 


Tip To import the decimal module, use this statement: 


from decimal import Decimal 
Representing numbers by using the decimal module 


To create a decimal number, type the following, substituting your number or object inside 
the parentheses: 


Decimal (numerical object) 
The numerical object can be any of these types of data: 


e An integer 
3) 
¢ Acalculation 
O  Decaumail (35) 
e Anumber represented as a string 
e Decimal ("3.2") 
e A float converted to a string with the str() function 
QO S55 & = 3.2 
e = 6>>> Decimal (str (x) ) 
CO Deeanail ("3.2") 
Warning You can't use floating point numbers directly with the decimal module. 
They must be converted to strings first. 
e A three-item tuple 
O SS> Decimal ((O, (3S, 1, 4), =2)) 
e Decimal ("3.14") 
Tip To create a decimal from a three-item tuple, use this format for the tuple: 


e Decimal 


oe 


1. The first item in the tuple is 0 (for positive) or 1 (for negative). 
The second item is another (nested) tuple containing the digits of the 
number, one element for each digit. 


3. The third item is a positive or negative integer specifying the 
exponent. 


Decimal numbers work with Python's regular math functions. Give them names and use 
the names as arguments, or insert them directly as arguments: 


>>> pow(Decimal(9), 2) 
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Decimal ("81") 
Viewing and changing parameters 


The decimal module has parameters that control how it works with and displays 
numbers. To see the parameters, type the following: 


>>> import decimal 

>>> decimal.getcontext () 

Context (prec=28, rounding=ROUND HALF EVEN, Emin=-999999999, 
liter) S)C)SVONSIS)VE) 
capitals=1, flags=[], traps=[DivisionByZero, InvalidOperation, 
Overflow] ) 


One of the parameters is the maximum precision of decimal numbers (how many digits to 
the right of the decimal point you see). To change the precision parameter, type 
something like this: 


>>> decimal.getcontext().prec = 7 

Tip If you want to change the precision of a specific number without changing the 
precision parameter for all the numbers, use the quantize () method. quantize () 
rounds the number to a specific number of decimal places. This example rounds to 
two decimal places: 


>>> data = Decimal('7.324') 
>>> data.quantize (Decimal ('.01"')) 
DSe smell ("7 532) 
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Chapter 8: Processing Lists and Tuples 


Lists and tuples are two of Python's compound data types, also called container objects. 
Compound data types group collections of data. 


Python has several kinds of compound data types. Lists and tuples are sequence types 
because each element of data is numbered sequentially, starting with 0. Sequence types 
support operations called indexing and slicing for working with individual elements or 
subsets of elements. You can find out about indexing and slicing in Chapter 6. 


A sequence data type is good when you want to operate on all the elements in a collection 
or when the order of elements in a collection is important. (In contrast, the dictionary data 
type—a mapping type—is for random access to elements, and a set is for comparing 
collections of objects. Dictionaries and sets are covered in Chapter 9.) 


This chapter shows you the syntax for lists and tuples, when to use them, how to work 
with elements inside them, how to use them in loops, and how to avoid some of the 
common errors programmers make when working with them. 


Introducing Lists and Tuples 


The following sections describe the features and syntax of lists and tuples and explain 
when to use each. 


What a list is 


A list is a mutable data type, which means you can change the contents of a list without 
creating a new list. The elements of a list can be of different data types. A single list can 
contain numbers, strings, other lists, tuples—and even functions and classes. 


In Python, a list /iteral (the actual data, not a name referring to the data) is defined by 
square brackets surrounding zero or more elements. Elements are separated by commas. 
Below are examples of lists: 


a disc = [iinis?, Yaisv, “al, Filise’] 
empey list = {] 
aimoemer list = [5] 


Lists come with a variety of built-in methods that let you manipulate them. For example, 
to sort a list, you write code like this: 


>>> mylist 
>>> mylis 
SS miy lls 
[l, 4, Sy 8, 29, 33, 98] 


SS Nhe, Sore eae <li ays | ech 
5 Soueie () 


(fue (eur (eu 
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What a tuple is 


A tuple is a sequence data type that can contain elements of different data types. Tuples 
are different from lists because tuples are immutable. If you want to change the contents 
of a tuple, you must create a new tuple that has the new content you want. 


Warning If a tuple contains a mutable element (such as a list or dictionary), the mutable 
element can be modified. However, some of the obvious ways of changing the 
mutable element will cause errors because Python will attempt to modify the 
tuple. We recommend that you avoid changing mutable elements in tuples. 


Tuples don't include a lot of special methods, the way strings and lists do. But tuples do 
include methods that support standard Python operators. For example, you can add 
tuples, check whether an item is inside a tuple, slice elements in a tuple, compare a tuple 
to other data, find a tuple's length, and so on. 


Many Python functions, such as the time and date functions, return tuples. 


Choosing between lists and tuples 
Lists and tuples look similar at first glance, but they have intrinsic differences: 


e A tuple is immutable, and a list is mutable. 
e Tuples are more memory-efficient. 
e Lists have several useful built-in methods. 


Tuples are often better than lists for information that you don't want to change, for two 
reasons: They use less memory, and (because they're immutable) they won't change 
unexpectedly. 


Lists are often better for information you want to change because there are more ways of 
manipulating them. 


Guido (Python's creator) and the Python community promote the following conventions 
for choosing between lists and tuples: 


¢ Tuples for heterogeneous data, lists for homogeneous data: Use a tuple if your 
data includes several different data types, such as names and addresses; use lists 
for elements that are all of the same type. 


For example, if you're accessing a database, the fields of a record might be 
returned as a tuple, but the records themselves should be returned as a list. 
Therefore the overall database would be a list of tuples. 

e¢ Tuples for sequence keys: If you need to use a sequence as a dictionary key, you 


must use a tuple because dictionary keys are immutable. 
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¢ Tuples for some functions: Some functions require arguments to be passed in 
tuples. 

e Lists for mutable objects: Avoid using mutable objects in immutable containers 
because errors and unexpected results can occur if you attempt to change the 
mutable objects. 


A tuple literal is defined in one of three ways: 


e Asingle element followed by a comma 
e Multiple elements separated by commas 
e An empty set of parentheses 


When Python outputs a tuple, it encloses the elements in parentheses. You can (and 
usually should) use parentheses when creating a tuple, but you don't have to. Below are 
examples of tuples. 


P2> a tuple = ("tmis’, Yis!, "al, ‘euple’) 

PSS tuple 2= “ehis”, "is", Yamotaere”, “irwele” 
PSS icwjole 2 

(‘eae !’, Yas", Vainoklnee, Viwple") 

PS> sultry cwujple = (() 


A tuple (except for an empty tuple) must always use a comma, even if there is only one 
element, because the comma is the operator that defines a tuple. 


Tip Because the comma is overloaded (that is, it does more than one thing in Python) and 
because it has low precedence (that is, most other operations are evaluated before 
comma operations), it's usually a Good Idea to add parentheses to make it clear that 
you are using a tuple. 


Below are the results of using the * operator on a single-element tuple and on an integer. 


PSS aMm@Otmnec cule = (3, ) 
PPS ametiner cujsle * 3 
(35 Sy 3) 

PrES MOE A icwjole = (3) 
PrS MO a wujsle * 
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Manipulating Sequence Objects 


The following sections describe how to compare, combine, and search the contents of 
lists, tuples, and other sequence objects. 


Comparing sequence objects 


Sequence objects such as lists can be compared to other objects of the same data type. A 
comparison tells you whether the objects are equal or whether one is smaller than the 
other. Here's how it works: 


e The first items of each sequence are compared, and then the second items, and so 
on. 

e If an item inside a sequence is itself a sequence, then each of its items is 
compared against the item of the same number in the other sequence. 

e If all items of two sequences are equal, the sequences are equal. 

e Ifsequence | is the same as the beginning of sequence 2, sequence | is smaller 
(lesser). 

e Strings are compared based on the ASCII order of their characters. Chapter 6 has 
more information about character order. 


Table 8-1 shows the results of comparisons between sequences. 


Table 8-1: Comparisons between Sequences 
=™Open table as spreadsheet 
Comparison Reason 


(1, 2, 3) < (1, 2, 4), [1, 2, 3 is less than 4 
3] < [1, 2, 4] 


"ABC' < 'C! < 'Pascal' < ASCII order of characters 
"Python' 


(1, 2, 4) > (1, 2, 3, 4), (1, First sequence is the same as the beginning of 
2) < (1, 2, -1) second sequence 


(1, 2y. Bh == Uy 20, 2.0) Integers equal their float equivalents 
(1, 2, (‘aa', ot < (1, 2, — Alphabetical order of characters in sub-sequence 
(tabc', 'a') ) 


(‘aa’ is smaller than ‘abc’) 


If you compare objects of different types, the types are ordered alphabetically by their 
names: Lists are smaller than strings, which are smaller than tuples. The rules for 
comparing objects of different types may change in future versions of Python. 
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Operating on sequence objects 


The following operators work on lists and tuples. Most of these operators also work on 
other data types: 


e Addition (+): Concatenates the contents of lists together to form a new list, or 
concatenates the contents of tuples together to form a new tuple. (You can't 
concatenate a list and a tuple.) 

Ss al, 2 Sl ee ee A] 

©) pee 

e Sequence multiplication (*): Repeats the contents of a list or a tuple. 

e Augmented assignment (+= and *=): Simultaneously concatenates and assigns or 
repeats and assigns. The statement mylist += newlist is the same as mylist = 
mylist + newlist. The former works slightly faster and is easier to read. 

e Comparison (< >, ==, !=): Tests whether the contents of two or more lists or two 
or more tuples are the same. 

e Contents test (in): The keyword in determines whether a list or tuple contains 
the given element and returns True or False, as shown here: 

O S>> Phone soumds = (‘iseeo', Yieime!, “tligiic or tle loumoleloes” ) 


OQ 22> Wy = “sieejo’ alin) jelaveusve)_‘srowulnvels 
OO SSS jy 
Mee 


Listcraft: Methods, Indexes, and Slices 


In this section we describe important ways you can work with lists. You can turn other 
data into lists, count items in lists, sort lists in various orders, and use indexing and 
slicing to change, add, delete, and move individual list items. 


Functions that work with or create lists 


The list () function creates a copy of a list. It also turns any other sequence object or 
iterable into a list. (An iterable is an object whose elements can be retrieved one at a 
time.) This example turns a tuple into a list: 


>>> mytuple = ('apple', '‘orange', 'pear') 
>>> mylist = list (mytuple) 

>>> mylist 
['apple', 'orange', 'pear'] 


If you feed a string to the 1ist () function, it makes a separate list item out of each 
character, as in this example. 


>>> mystring = "Boo!" 
>>> list (mystring) 
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Vit Owe, iO, Palate) 


To find out the number of elements in a list, use the 1en() function (it also works on 
other data types). 


>>> len(mylist) 
3 


Methods of lists 


A method is like a function except that it "belongs" with a particular object, and when 
you call it, you qualify it with the name of the object. The method names are attributes of 
the object. 


List method syntax 
To call a list method, type the list name, a dot, the method name, and parentheses. In the 


parentheses, pass any arguments the method needs. (Some methods don't require or use 
arguments.) Here's an example: 


list_name.method_name (arguments) 
The most popular list methods 
The following are some of the most frequently used list methods: 


e To add an item to the end ofa list, use the append() method. This method 
changes the list in place. It returns None. 


This example adds the string 'apple' to a list: 


basket 
basket 


= ['apple', 'banana', ‘orange' ] 
-append('apple') 


e To find out the number of times a value occurs in a list, use the count ( value ) 
method. 


This example counts the number of times the string 'apple' appears. 


>>> basket.count ('apple') 
2 


e To add to a list the contents of another sequence object or iterable (an object 
whose elements can be retrieved one at a time), use the extend() method. If the 


iterable is a string, each character is added individually. 


This example adds each character of the string 'pear'. 
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>>> x = ['apple', 'apple', 'banana', '‘orange'] 

>>> x.extend('pear') 

SSS ox 

[Yajole”, Yeysjole’, Yoainaime’, Yoreingo’,; Yo’, Yel, Ya, Vie'll 


e To delete the first occurrence of an item in a list, use the remove () method. It 
raises a ValueError if the item isn't found. 


This example removes the first 'apple' string. 


>>> x.remove ('apple') 
>>> x 
[iapple’, Vseweme", Yoremea’, “ol, Yet, Tal, Vel] 


e The sort () and reverse () methods both change the list itself. 
o The sort () method's default ordering is alphabetical for lists containing 
text, numerical for lists containing numbers, and so on. 
o The reverse () method inverts the positions of items in a list. 


Tip Python 2.4 and later also support these built-in functions: 
o sorted(), which returns a sorted copy of the list (or other iterable). 


o reversed(), which returns an iterator object that lets you use a loop 
to process the items in the list in reverse order. 


The following examples show the use of the sort () and reverse () methods ona 
list of strings: 


Se Seer () 

SSS ok 

ve, “eleyolket. Vice; Yel’, Vowemeest, Viol, vial] 
>>> x.reverse() 

SS ox 

[Vaev, “Vol, VYorwaimneeY’, el, Woainaime’, Yajsjole’, Ya] 


List indexing and slicing 
List and tuple elements have index numbers. Indexing and slicing operations on lists and 
tuples work mostly the same way that they do with strings. Here's a quick review of how 


index numbers work. For more about indexing and slicing, see Chapter 6. 


e Working from left to right, the first element has the index 0, the next has the index 
1, and so on. 


This example shows the items at index 0 and 1 of the list x. 
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>>> x = ['apple', "banana', 'pear'] 
SSS sx O] 
‘apple' 
SSS x [[ 1] 
"banana' 


e Working from right to left, the first element has the index -1, the next one -2, and 
so on—but the leftmost element is still 0. 


This example shows the items at index -1 and -2. 


>>> x[-1] 
"pear! 

>>> x[-2] 
"banana' 


e The beginnings and ends of slices sit between the elements, like notebook 
dividers. 


This example shows that slice 0:1 contains just one element. 


SSS sx Oeil] 
['apple'] 


How indexing and slicing are different with a list 
Indexing and slicing work slightly differently with lists than they do with strings and 
tuples because lists are mutable. You can change a list by using indexing and slicing. 


With a tuple or a string, you need to give a new name to the changed object (or reassign 
the same name). 


Retrieving items from a list with slicing 
Following is a brief review of slicing syntax. You can find more information in Chapter 6. 


The syntax for slicing is as follows: 


SSCGMEACS GClojSCie mene [lw dwsicdinclexs lasiclinclex3 SieSio] 


These are the rules for slicing: 
e You can leave out the first slice index (but not the colon following it). It 
defaults to 0. 


e You can leave out the last slice index. It defaults to the last item of the sequence. 


To access the last sequence item in a slice, if you don't know the length of the 
sequence, leave the index blank. 
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e You can leave out the step and the colon preceding it. The step defaults to 1. 

e When you specify a range, it begins with the first slice index specified and 
ends before the last slice index specified. This is because slices point between 
elements. So [0:2] retrieves the first and second elements. 

e The notation [:] returns a shallow copy of the whole sequence. 


The section, "List references that unexpectedly change," later in the chapter, has 
more on shallow copying. 


The following example code creates a list of numbers and then, using slice syntax, selects 
and displays all the even-numbered items. 


>>> mylist = range(0,10) 
SSS myliasic 332] 
[Op 2, 4, G, 8] 


Changing a list item by using its index number 


One way to change the contents of a list is to assign a new item to a particular index 
number (also called an offset). This replaces whatever is already at that index number. 


Gotcha! Assigning numbers and strings to slices 


If you assign to a slice, you must assign an iterable. This means two things: 


You can't assign a number to a slice. 
, S2> mylist = [i, 2, 3, 4, Sy, Gl 
, SS> mylisic [233] = 8 


a 


Traceback (most recent call last): 
mile “<eiechiine”, dating: il, align 2 
TypeError: can only assign an iterable 
If you assign a string to a slice, each letter becomes a separate list item. 
>>> mylist[2:4] = 'three' 
>>> mylist 
Mal, Bp Mev, Val, Viet, Vet. Met. By a] 
11l.way around this second problem is to put the number 


WOON A OB WN 


One way around this second problem is to put the number or string into a list. The 
following example assumes you want to replace the numbers 3 and 4 with the strings 
‘three’ and ‘four’: 


SSS imyllist = i, 2, 3, 4, S, Gl 
SSS imllise( 234] = | Veleee!, "item! | 
SS> mylisic 


lil, 2p Velieas!, Yirouse’, 5S, Gl 


To assign a new item to a list by index number, follow these steps: 
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1. Type the list name and then, in square brackets, type the index number, like 
this: 

ZA, mpl ase. i] 

3. On the same line, type = and the new list element, so your line looks 


something like this: 
4. mylist[1] = new_list element 


In the following example, the element at index 1 (the string 'eggs') is replaced with the 
string "lovely spam". 


>>> L = ['spam', ‘eggs', ‘ham'] 
>>> L[1] = 'lovely spam' 
SSS ik 


['spam', 'lovely spam', ‘'ham'] 


You can't assign to an index number that doesn't exist. Python raises an IndexError, as 
shown here: 


>>> L[3] = 'green eggs' 

Traceback (most recent call last): 

Pile Yesieciiias”, lime i, im <mochile> 
IndexError: list assignment index out of range 


To add items to the end of a list, use the append () method: 


>>> L.append('green eggs') 
SSS ik 
['spam', 'lovely spam', 'ham', 'green eggs' ] 


Assigning to a list with slicing 
To assign several items in a row to a list, use a range of index numbers. 
Assigning using a slice follows these rules: 
e Python first deletes the items currently assigned to those index numbers and then 
inserts the new items. 
e You don't have to assign the same number of items that you delete. 
e¢ You must assign an iterable (see the sidebar, "Gotcha! Assigning numbers and 
strings to slices"). 
To assign items to a list slice, follow these steps: 
Type the name of the list and, in brackets, type the slice notation, like this: 


1. 
ea ee 
3. On the same line, type = and the items you want to assign. 
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If you assign items from another list, your whole line might look something like 
this: 


ji(ils4| = [sam , Ysa’, ' sjoeun’ || 
The resulting list now looks like this: 
[‘sjoemi', Vsjoemm', ‘sjoem’, Visjoeni' | 


The following example manipulates a list of words by using slice assignment: 


>>> L = ['spam', ‘eggs', ‘ham', 'green eggs'] 
SSS bi s2i = (come, Mireinic)) 

S55 1h 

[“clom'e, Yiueume!, Vine! VGuecin aces ! || 

Sea Ins] = ("eredin eoes', Weumnel’, avenin'’) 

SSss Ih 

["don't", 'want', 'green eggs', 'and', 'ham'] 


List indexing and slicing operations 
The following operations use the indexing and slicing features of lists: 


e To return the index number of a list item, use the index () method. 


If the item isn't in the list, it raises a ValueError. 


This example returns the index number of the second item in the list: 


>>> mylis 
>>> mylis 
il 


= [iy ttewo', 3] 
. index ('two') 


¢ To remove from a list the item(s) corresponding to the given index number or 
slice, use the keyword del. 


This example deletes two items. 


SSS myllisc2 = | VMewisy sive’, 'tesic lois Sigimal’, Yino Sitjeln 
number', ‘at the 

tone, the time will be'] 

>>> del mylist2[1:3] 

>>> mylist2 

['busy signal', 'at the tone, the time will be'] 


e To insert an item before a particular index number, use the insert () method. 
e 6>>> basket.insert(0, 'peach') 

e 6>>> basket 
° ['peach', ‘apple', 'banana', ‘orange' ] 
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e To remove and return an item at a particular index number, use the pop () method. 
The default index number for pop () is the last item in the list. This example 
removes and returns the item 'baked beans' in the list breakfast: 


e >>> breakfast = ['spam', 'spam', 'spam', 'baked beans', 'spam', 
"spam! ] 

e 6>>> breakfast.pop (3) 

° "baked beans' 

e = 6>>> breakfast 

e ['spam', 'spam', 'spam', 'spam', 'spam'] 


The following code uses a while loop to return and delete all the list elements one 
at a time: 


>>> while breakfast: 
. breakfast.pop() 

"spam' 

"spam' 

"spam' 

"spam' 

"spam! 

>>> breakfast 

[] 


Steering Clear of List Gotcha's 


This section describes how to avoid some common list errors. 
Simultaneous test/add/delete 


If you're testing a list condition in a while loop or iterating over a list in a for loop, 
avoid adding and deleting list elements at the same time. It's easy to lose track of how the 
list is changing. An exception is the use of pop () in a while loop to remove all list 
elements one by one. 


List references that unexpectedly change 

Because lists are mutable, you can change them without making a copy. But if you 
reference a single list from several places in your code, it might change when you aren't 
expecting it to. 

Shallow copying 

If a container object (an object—such as a list, tuple, or dictionary—that holds other 
objects) contains lists, then making a copy of the container object by using 


newcontainer = oldcontainer Of newcontainer = oldcontainer[:] doesn't make a 
copy of the lists it contains. The new container continues to refer to the original lists. This 
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is called shallow copying, and Python uses it because (among other reasons) it saves 
resources. The following example illustrates shallow copying. The larder list changes 
when one of the lists inside the food list is changed: 


>>> breakfast = ['spam', 'baked beans'] 
>>> basket = ['peach', 'apple', 'gooseberries'] 
>>> food = [breakfast, basket] 


>>> larder food 
>>> larder 
[['spam', 'baked beans'], ['peach', ‘apple', 'gooseberries']] 
>>> del basket[0] 

>>> larder 


[['spam', 'baked beans'], ['apple', 'gooseberries']] 


Deep copying 

Tip Sometimes you want to make a full, or deep, copy of a list (or any item that contains 
other items). To do so, import the copy module and use the copy. deepcopy () 
function. In this example, when you copy the food list to the larder list by using the 
deepcopy() function, changing the contents of one of the lists in food doesn't 


change the contents of larder. 
SS ioc! 
[['spam', 'baked beans'], ['apple', 'gooseberries']] 
>>> import copy 


>>> larder = copy.deepcopy (food) 

>>> larder 

[['spam', 'baked beans'], ['apple', 'gooseberries']] 
>>> del basket[0] 

SSS ir@eyel 

[['spam', 'baked beans'], ['gooseberries']] 

>>> larder 

[['spam', 'baked beans'], ['apple', 'gooseberries']] 


Disappearing lists 


The list methods append() and sort () change a list in place but don't return the list. 
They return None. Don't use these methods in assignment statements or else the list might 
disappear, as in this example: 


SSS ih = (Yay, “ibis! ] 

>>> L = L.append('x') # wrong! 
SSS joreiiaie Ib 

None 


To append to a list, just use the append() method by itself. Don't give the result a name. 


>>> Thy = aiyAY WL Sie | 
>>> L.append('x") 

SS joie iimic Ib 

| any" Wales Eas Uae 
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That tricky asterisk 


If you repeat (*) a list, you get a list with multiple copies of its elements. But if you 
repeat (*) a list containing a list, you get multiple shallow copies of the contained list. 
The following examples demonstrate both repeating a list and repeating a list that 
contains a list: 


e A repeated list: 


Oo Ss> myllist = (3, VWolaimel’, Vimaee’ | 

OQ Se> x = myllistr * 2 

e >>> x 

Oo [ap “elaine, Yiskea", 3, Veilalinel!, Vnnlee 

e Repeating a list containing a list: 

OQ SSS iy = aylliisie = 2 

e Sosy 

oO |S, Yisilaincl!, Yankee’, [S, Viswhainel, Yuaee’ | I 


In the preceding examples, if you change mylist, the value of y changes because y 
contains references to mylist. But the value of x doesn't change because x only contains 
copies of the elements of mylist. Here's what happens: 

We change the first element of mylist: 

SSS wyllisic (Oj) = “iewo’ 


Here's the result: 


e y reflects the new version of mylist: 


ee >>> y 

e [i Miewoe’, Wollaiayel", “inves! , [Viewo’, VYoilalinel!, Vindale” | J 
e x doesn't change: 

e Pees 

e [3, “olaiacl’, Vinee’, Sy “Youlatinel”, Yiakee” | 


Performance problems 


Manipulating large lists can cause performance problems. Here's how to avoid these 
problems: 


e Avoid copying long lists. 
For example, avoid using loops that delete elements from the beginning of a list 
(del mylist[0] ormylist.pop (0) ) because doing so requires Python to make 


multiple copies of the list. 


e Avoid making multiple copies of the elements of a list. 
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Copying the list itself doesn't take a lot of time, but copying the elements does. 


e Build lists when you need them. Populating a list with elements before you need 
the list isn't necessary, and it might even waste memory. 


Building Lists, Stacks, and Queues 


This section describes two often-used list tasks: building a list one element at a time and 
building stacks and queues. 


Building lists incrementally 


Because lists are mutable, you can build them one element at a time. It is often useful to 
build lists from other iterable objects or to combine multiple lists. The easiest way of 
doing this is to write a for loop and build the new list with the append () method, as in 
the following example: 


Ll = ['spam', 'spam', 'spam', ‘baked beans', None, 'spam', 'spam'] 
be = [] 

E@re a atin) Til 

if i is not None: 

L2.append (i) 


After you run the above code, the value of L2 is ["spam", "spam", "spam", "baked 


beans", "spam", "spam"]. 


Warning If you're iterating over a list in a for loop, it's best not to add elements to or 
delete elements from that list—doing so can introduce errors. It's safer to use 
code like the above example to build a new list and then change the elements of 
the new list. 


Python provides several other tools for building lists: 


e To convert or copy sequences (and other iterable objects) to a list, use the list () 
function (described earlier in this chapter). 

e Another way to copy a whole list or other sequence and return a shallow copy is 
to use this slice notation: mylist[:] 

e Python includes a feature called a list comprehension, which works in much the 
same way as a list-building for loop but is packaged more compactly. (See 
Chapter 16 to find out about this advanced feature.) 

e To loop over two or more sequences at the same time, use the zip () function. It 
returns a list of tuples that associates the items in one sequence with the 
corresponding items in another sequence. It also truncates the resulting list to the 
length of the shortest sequence. 
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Oo Serer = (thy Sie Sy IE 

O° SS Ip ("eggs', 'spam') 

DY SS> © = wijo(a, 15) 

O SSS jowime © 

° Ite, *egeis’), (Ss, sjocum') | 


If you use zip () again with the «args style parameter, you get a list of tuples that 
correspond to the original sequences you started with (but any items that were 
truncated are missing). 


Tip The *args style argument is explained in Chapter 11. 


Here's what it could look like: 


Se wile ((e) 
[(3, 3), Ceges!, "spem) | 


e To get tuples of list index numbers and their elements, use the enumerate () 
iterator. 

Oe SS es = [tty Se oy ab 

e >>> for I, item in enumerate (a): 


e soo joiealime (It, alice) , 
Sen Ones Gye oes Seen 
Stacking and queuing with lists 


Stacks and queues are concepts that describe how items get added to and removed from 
lists. Stacks and queues are not Python objects, but you can write code in which a list 
behaves like a stack or a queue. 


A stack is like a spring-loaded plate dispenser where the last item on is the first item off 
(this is also called "last in, first out" order, which is inscribed in the Great Book of Geeky 
Acronyms as "LIFO"). A queue, as folks familiar with British English know, is like a line 
at a store checkout counter: The first person in is the first served (also called "first in, first 
out," or "FIFO"). 


Adding items to a stack 
To add an item to the top of a stack, use append (). 


This example puts six plates on the pile: 


>>> plates = [] 
>>> for p in range(6é): 
. plates.append (p) 


>>> plates 
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Retrieving items from a stack 


To retrieve an item from the top of a stack, use pop () without an argument (so it defaults 
to the last item in the list). 


To retrieve all the items, use pop() 1n a while loop. Get the plates off the pile, last-on, 
first-off, like this: 


>>> while plates: 
. plates.pop() 


(Sy [= [SS) (Gey SS. fon) 0 


Queuing up 

To use a list as a queue to process items "in the order in which they were received" 
(anyone who has tried to talk to a large company on the telephone will be familiar with 
that phrase), follow these steps: 


1. Use append () to add an item to the end of the queue. 


In the following example, we create an empty list and then append six callers to it: 


>>> callers = [] 
Sh iOie © iin iceinee (6) 2 


. callers.append (c) 


>>> callers 
I@, te 2p 37 4p Si] 


Use pop (0) in a while loop to retrieve an item from the front of the queue, 
like this: 


3. >>> while callers: 
4. . Ccallers.pop (0) 
Beer cer 

65 0 

ieee ll 

is 2 

9, 3 

10.4 

Li. 5 


Warning To avoid performance problems, don't use pop (0) with a large list (more 
than about 100 elements). 
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Taking Tuples in Hand 


This section describes some ways to work with tuples. We cover converting other objects 
to tuples, several ways of creating tuples, using tuples to swap values, and efficiently 
giving names to individual items in tuples. 


Converting another object to a tuple 


To convert another iterable object to a tuple, use the tuple () function: 


SS> miylisic = |i, 2, 3] 

>>> tuple (mylist) 

(1, 2, 3) 

REMEMBER If you convert a list to a tuple, remember that the new tuple cannot be 
changed. 


Tuple packing 


If you type some names separated by commas, you create a tuple. This is called tuple 
packing. It works on either side of an assignment statement. 


The following example creates a tuple containing two strings: 


SSS cg = Tolga, loilein, lolli! 

SSS ie = VN GIINEHIR” 

2>> What they hear = q, © 

>>> print what_they hear 
(Nolin, lolela, lolleia’, ‘GiuNG@iHR" ) 


To create a tuple containing more than two items, just use multiple comma operators. 
This example creates a tuple containing four strings. 


>>> qr ry, qr Es 
(Mislaln, idileila, lollela’, YGIUNGHIRY,, Vigleilm, lolleim, Idileia’, YG IINGEIRY ) 


Tuple unpacking 

Unpacking a tuple means giving a different name to each element of the tuple. 

Tuple unpacking is useful because many functions in Python, such as the zip () function 
described in the previous "Building lists incrementally" section, return tuples. With tuple 
unpacking, you can easily get at the individual items in these tuples. 

To unpack a tuple, just assign multiple names on a single line. Put the names you want to 


unpack into on the left side of the assignment statement and put the tuple on the right 
side. You need one name for each item in the tuple. 


126 


The following example unpacks a tuple consisting of three lists: 


SSS x = [(licock emusings!,; Yseissors!), (josjoeie Coweies!, ieee!) , 
("scissors 

Gio’, Vjeeyersie)) || 

>>> rock, paper, scissors = x 


SSS joreiiaic OOK 
(‘rock crushes', 'scissors') 


If you don't have the same number of names as tuple items, you get a ValueError, as 
shown here: 


>>> rock, paper = x 

Traceback (most recent call last): 

Palle “<siechias, laine iL, satin <ukeychlles 

ValueError: too many values to unpack 

Tip You can actually unpack any sequence object in this way. The following example 
unpacks a string: 


SSS = Vsitcieiing! 
Sr wea ey (oun eh) dey Sie Oc 
Poe sek 


Ug 
SSS jowiimt a, lo, CG, cl, Gp, i 
Sc ic a im ¢ 


However, unpacking is most commonly used with tuples because there are better ways to 
get elements out of strings and lists. 


Using a tuple to swap values 


Because of the special way assignments work with tuples, tuples are the Pythonic way of 
swapping values. 


This example swaps the values associated with two names: 


SS> Ce = “WSillaln, loileila, loileia" 
>>> rr = 'GINGER' 
SSS Cp i 


(Violleia,, loileia, lolleiaY, GIINEEIR) 
SSh CG, eS e, Sf 

SSS Ci, ie 

("GINGER", ‘blah, blah, blah") 
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Chapter 9: Diving into Dictionaries 


Overview 


In this chapter, we discuss dictionaries (dicts for short) and sets, two data types that store 
multiple data elements. Here are some of the things you can discover in this chapter: 


e What dictionaries and sets are and how they differ from sequences 
e Some of the most useful dictionary tools and set operations 
e How to convert other types of data into dictionaries and sets 


Dictionaries are useful in the following general and specific circumstances: 


e When you need random access to elements. 
e When you want to focus on uniqueness. 


But consider using sets if you are using Python 2.3 or later. See "Setting Them 
Up," later in this chapter. 


e For mapping several spellings of a term to a preferred spelling. 

e For using a key to find a function to call (known as a dispatch table). 

e For looking up zip codes or bank account numbers. 

e For working with numeric indexes with only a few noncontiguous keys, when you 
don't need to use slicing. 


TECHNICAL Data that fits the preceding description is called sparse data. 


STUFF You can also use a list in this case, but a dict is more 
memory-efficient, which often makes things faster. 


Defining the Dictionary 
Here's how dictionaries (dicts) compare and contrast with other Python data types: 


e Dictionaries are a container type that stores data, like lists and tuples. 
e Dictionaries are a mapping data type — indexed by keys. 


A dictionary's elements (the items it contains) are key:value pairs. In contrast, lists 
and tuples store data by using numeric indexes (they are sequence data types) and 


support indexing and slicing. 


e Dictionaries are mutable. 
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That means a dictionary can be modified in place—you don't have to create a 
copy of it to modify it (as you do with strings and tuples). 


¢ However, a dict's keys must be immutable. 


Creating a dictionary 


The simplest dictionary is a pair of braces ({}), which defines an empty dict. 


REMEMBER To create a dictionary with some data in it, follow these steps: 


TECHNICAL 
STUFF 


1. 


Ae 


Type a left brace, a key, a colon, and a value (the information 


you want to associate with the key), like this: 
(“paiewcor’s “iPiimilingy HOI i yjoOiecls™ 


The key can be any immutable data type, such as a string or 
integer. Most keys are strings. The value can be any data type. 


For each additional element in the dictionary, type a comma to 

separate it from the previous value and then type another key, 

a colon, and a value: 

(“paicwcor’s “Pimiiang toi tjorcds”, YPoliy"s “Mx—pacicot” 
Tip Repeat Step 2 for each element you want to add.To end 

the dictionary, type a right brace: } 


If you added two elements, your dictionary looks something like 
this: 


("paiewor’s “Piimiling; ior igjo@imcs”, “Rolly”: Wlx—jesicieoie’)) 


Hash strings or hash browns? 


Some other languages have dictionary structures; they are called 
things like hashes, associative memories, or associative arrays. We 
think dictionaryis a friendlier term. 


Dict keys must be immutable because Python associates them with a 
unique number called a hash. A hash value (or hash) is historically a 
number generated from a string of text. Because the number and 
string pair are unique, hashes prevent problems that might occur if 
two different objects have the same value. 


Hashes are used for space-efficient storage, for security and 
cryptography, and so on. Another benefit of hashing is that after the 
hash is computed, keys can be looked up directly—there's no need 
for the computer to search through a whole list of keys. (In human 
terms, hashes help a computer be smart in the same way a human is 
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smart when she knows to jump to the end of a physical dictionary to 
look up a term that starts with Z.) In Python, this feature makes 
dictionaries very fast. 


In Python, most immutable objects have a hash, and the attributes of 
all namespace objects are stored in dictionaries. 


TECHNICAL Some details about dictionary keys: 
STUFF 


¢ Keys must be an immutable data type—for example, a string, 
integer, or tuple. 

e Ifyou use a tuple as a key, the tuple cannot have any mutable 
elements. 


For example, you can't use a tuple that contains a list. 


e You can type a key directly into the dictionary, or you can 
type the name of an object, like this: 

e >>> x = "mydictkey" 

e >>> mydict = {x: "mydictvalue"} 


Order in the dict 


Python doesn't store dictionary elements in any particular order. If you enter elements in 
one order, they may be stored in another (essentially random) order inside Python. Here's 


an example: 

SSS llimloscjack = {'sileejels YAN imabefaie’, Ywi@rele) 9 “ALI cleny'! } 
>>> lumberjack 

{weosels"s WAILIL cleyy’, "sllagels VAILIL imstelaic) }} 


The steps to make a sorted list of dictionary keys depend on your Python version. 


Python 2.4 and later 


If you have Python 2.4 or later, follow these steps to get a sorted (usually alphanumeric) 


list of the keys of a dictionary: 


1. Type a list name and = sorted, followed by the name of your dict in 
parentheses, like this: 

5 22> sorted keys = someed (lumoeryaci) 

To see the sorted list of keys, type the name of the list. 

5 SSS SOrEeC! keys 

['sleep', 'work'] 


Ow W bd 
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Earlier versions 


If you have Python 2.3 or earlier, follow these steps to get a sorted list of the keys of a 
dictionary: 


1. Type a name, an = sign, the name of your dict, and . keys (), like so: 
> 22 list of keys = llumoeiejack. keys () 
3. Use the sort () method to sort the list by typing this on the next line, 
substituting the name you created in Step 1 for list of keys: 
4s Se> liste OF keys. Sor (() 
5. Type the list name again to see the sorted list of keys, like this: 
6, 22> iis Or keys 
Yo sileei', Yanorels "| 
Warning Don't try to give a name to the result of list_of keys.sort(). The sort () 
method changes the list in place but returns None, not a list (to find out more 
about list methods and what they return, see Chapter 8). 


i) 


Doodling Around with Dicts 


This section describes some of the common tasks you can do with dicts, such as storing, 
extracting, adding, deleting, and searching for keys and values. 


Popular dict operations 


The main operations for a dictionary are storing a key and its associated value and using 
the key to extract the value. 


Storing a value 
To store a key:value pair in an existing dictionary, follow these steps: 


1. Type the name of the dict and a key in square brackets, like this: 
Ze Se intyclaerc || Vclieerciiilai" | 


REMEMBER The key can be new or it can be a key that's already in the 
dictionary. If you use an existing key, the new value replaces the 
old one. 

3. Type = and the value you want to store. 


Your line of code might look like this: 


>>> mydict['dracula'] = 'stoker' 


The following example code creates a dictionary and then creates a new key, 'dracula', 
and gives it the value 'stoker': 


>>> mydict 
>>> mydict 


= {'jabberwocky': 'carroll', 'sherlock holmes': 'doyle'} 
['dracula'] = 'stoker' 
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>>> mydict 
{'jabberwocky': 'carroll', 'dracula': 'stoker', 'sherlock holmes’: 
"doyle'} 


Here, the existing key 'dracula' is given a new value, 'oldman': 
, y 


SSS iihyelilcie | \cleaculle | = V@ileimera" 

>>> mydict 

{'jabberwocky': 'carroll', 'dracula': 'oldman', 'sherlock holmes’: 
"doyle'} 


Extracting a value 


There are two ways of extracting a value based on a dictionary key. The best option 
depends on a couple of factors: 


e If you're certain the key is in the dictionary, or if you want to raise a KeyError 
exception when the key is not in the dictionary, use the square brackets syntax. 

e If many of the keys you're searching for aren't in the dictionary, or if you want 
Python to return a value you specify rather than raise an exception when keys 
aren't found, use the get () method. 


To raise an exception on failure 


If you want to raise an exception if the key you're searching for doesn't exist, type the 
name of the dict followed by a key in square brackets, like this: 


>>> mydict['jane eyre'] 

Traceback (most recent call last): 
Pile Vesiechias”, ilsiaes i, alia <inechnile> 
KeyError: 'jane eyre' 


To return a value you specify on failure 
To return a default value if the key doesn't exist, follow these steps: 
1. Type the name of the dict, a dot, get (, and a key. 
Your code might look like this: 
>>> mydict.get('jane eyre' 


2. (Optional) Type a comma and a string you want to return if the key doesn't 
exist. 


If you don't specify a string, the default value None is returned. 


>>> mydict.get('jane eyre', 'key not found' 
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3. End the line by typing a right parenthesis: ) 


Your completed line of code will look something like this if the key doesn't exist: 


>>> mydict.get('jane eyre', 'key not found') 
"key not found’ 


Finding out about dict methods 
Dictionaries come with built-in methods that allow you to perform the following tasks: 


e Retrieve items from dicts by using items (), keys (), and values () methods. 

e Use dicts in loops with iterkeys(), itervalues(), and iteritems() methods. 

e Test whether a key is in a dict by using the in keyword. 

e Delete items from dicts by using the del keyword and the popitem() and pop () 
methods. 

e Make shallow copies of dicts by using the copy() method. 

e Add items to dicts by using the setdefault () method. 


Retrieving items from dicts 


The keys (), values(), and items () methods are good for list-processing operations 
using the contents of dictionaries. The pop () and popitem() methods retrieve and delete 
things at the same time, so they're good for processing a temporary dictionary you don't 
need to keep or for processing items one at a time until there aren't any more. 


items() 


The items () method retrieves all the key:value pairs in a dictionary and returns them as 
a list of tuples. Each key:value pair is a tuple. 


To retrieve all the items in a dictionary, type the name of the dictionary, a dot, and 
items (): 


>>> mydrck — {nbed parcel as 300s pounds tbedwlengEn "25 teen longs} 


>>> print mydict.items () 
[( Yoel eine", “2 ireeic jkevete!)),, (Voecl jories!, ts) joxeruarels;)) | 


keysQ) 


The keys () method retrieves all the keys in the dictionary. It returns a list, as shown 
here: 


>>> print mydict.keys () 
['bed length', 'bed price'] 


values() 
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The values () method retrieves all the values in the dictionary. It returns a list, as shown 
here: 


>>> print mydict.values () 
[JZ oor ilhoie’, WisOW joowuncls” || 


popitem() 


To retrieve a single key:value pair and remove it from the dict, use the popitem () 
method, like this: 


>>> mydict.popitem() 
(Vioeel ileineficia’, V2 ic@oIe Ihe") 


Tip You can't use popitem() to specify which key:value pair to retrieve. It's most often 
used in a while loop when you want to process items one at a time until there aren't 


any more but you don't care what order they're processed in. 
TECHNICAL The popitem() method was added in Python 2.1. 
STUFF 


pop0) 


To remove a key:value pair and retrieve only the value, use the pop () method and pass 
the key in the parentheses, like this: 


>>> mydict.pop('bed price') 
"800 pounds’ 


Getting loopy with dicts 


You can use a dict's keys, values, or key:value pairs in a loop. This is called iterating 
over the items. 


TECHNICAL The keys (), values (), and items () methods also work in loops, 
STUFF but the iterator methods are more efficient. 


Here are several ways of looping with a dict. 
Keys 
To iterate over keys in Python 2.2 and later, create a for statement with the following 


syntax, substituting the name of your dictionary (you can also use a different name for the 
iterator, although key is standard for keys): 


for key in mydict: 


TECHNICAL The preceding code works only in Python 2.2 and later. It is 
STUFF equivalent to the following in earlier versions: 
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for key in mydict.keys(): 
Values 


To iterate over values, use this syntax in the for statement: 


for value in mydict.itervalues(): 
Key:value pairs 


To iterate over key:value pairs, use the following syntax: 


for key, value in mydict.iteritems(): 

Testing a dict 

How you test whether a single key appears in the dict depends on your version of Python. 
(You might want to do this if, for example, you wanted to see if a customer number was 
in your dictionary.) 


Python 2.2 and newer 


To check whether a single key is in the dictionary, use the keyword in within an if 
statement, like this: 


aie Vioyeyel joueakiets4 alia imyielaieie 9 

Older versions 

If you are using Python 2.1 or earlier, use the has_key() method instead: 
Lit myclict Ines kexy7((Voecl ieee” ) 3 

Deleting items from a dict 


You can delete a specific key:value pair from a dict by using the de1 keyword. If you're 
working with the dict in a loop and doing something with the dict data, using pop () or 
popitem() is more efficient; they return and delete elements at the same time. 


Key:value 


To delete a key:value pair from a dictionary, use the de1 keyword and give the key in 
square brackets. For example: 


del mydict['bed price'] 


Individual elements 
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To delete elements from a dict one at a time in a loop, use either of these methods, 
depending on whether you want the key:value pair or just the value returned: 


* popitem(): The popitem() method returns the key-value pair and deletes the 
pair from the dict. 

e while mydict: 

e key, value = mydict.popitem() 

¢ pop(): The pop() method returns the value and deletes the key:value pair from 
the dict. 


e for key in mylist: 


° if key in mydict: 
° value = mydict.pop (key) 
All elements 


To delete all the elements of a dict at once, use the clear () method. This removes all the 
elements at once (leaving an empty dict) and returns None: 


mydict.clear() 

Making a shallow copy of a dict 

To make a shallow copy of a dict, use the copy () method. 
mycopy = mydict.copy() 


See the sidebar, "Understanding shallow copies," to find out more about shallow copies 
of dicts. 


Get or set: Adding items to a dict 
The setdefault () method (Pythonistas call it get or set) is like get (), which extracts a 


value by using a key. But setdefault () goes one step further: If the key is not found, it 
adds the key to the dict. 


Understanding shallow copies 
To understand shallow copies, you need to know something about how Python stores 


information. When you give a name to an object such as a list that stores some values, 
Python creates a reference (the name) to that object (the stored values). 


SSS mylise = [amy , "liste’] 
Now, say you create a dictionary that contains this list: 


SS> mycilete = {lanymumlosie’s 1, Yalisice’s imylisic, "myeupile!s (il,2) } 
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If you look at the dict, it looks like the values stored in the list are stored in the dict, too: 


SSS myclilce 
{"myiewjole"s (i, 2), Yaiisi's [Ymy’, Ylise’], “mymumloer’s i}! 


But in fact the dictionary is not storing the actual values stored in the list. It is only 
storing the reference to the list. 


That means, if you change the list... 


>>> mylist.append('changed') 


... your dict also changes. 


>>> mydict 
{"myewole’s (il, 2), "allagte’s [[Ymny’, Ulisse’, Yelaameecl’], Vunyawnloeiwe’s il} 


When you make a shallow copy of something, you copy only the references, not the 
actual values that the references point to. So if you make a copy of a dict by using the 
copy () method, the copy reflects any changes made to the list that the dictionary is 
referring to. 


>>> mycopy = mydict.copy() 
>>> mycopy 
{"myewjole’s (i, 2), “allisie’s [[vany), Ylist’, "chamcgecl"], “imymuniloee!s i} 


To make a copy of the actual values so that these changes don't occur, import the copy 
module and use the deepcopy() function. See Chapter 8. 


The standard use for setdefault () is for adding a list (or other mutable data type) to the 
dict and appending items to the list. Here's how the syntax looks: 


mydict.setdefault (key, []).append(value) 
Here's what happens: 

1. Ifthe key isn't in the dict, Python adds the key and uses an empty list as the value. 
Then it appends the value you specified to the list. 

2. Ifthe key is in the dict, Python tries to append your value to that key's value. (For 
this to work, the value must be a list because the append() method works only 
with lists.) 

REMEMBER If you don't specify a value, the value None is used. 


The following code creates a key, 13, and pairs it with a value of an empty list. Then it 
appends the string 'complaints' to the list: 
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SSS roois = {VilA)s || vesccqmmem: @limie’]), 12s | Yalouse" || } 


>>> rooms.setdefault (13, []).append('complaints') 
>>> rooms 
fVI2A' Ss | Varecumcine Ciainie!],, 12¢ [valouse’], as i’ eomoilasines” ] } 


The following code appends the string 'don't bother" to the list that has the key 13. 


>>> rooms.setdefault(13, []).append("don't bother") 

>>> rooms 

(V12A's Yaregumeme Cilliicie! ], 123 i elouse’], ge ["Gomolamncs”, “clove ie 
lovorelaveie || }} 


The following syntax does the same thing with a set. 


mydict.setdefault (key, sets.Set()).add(value) 


Building Dictionaries 

In Python, most compound data objects are created by processing data in a loop structure, 
so you'll often build lists, dictionaries, and so on by writing loops. Following are some 
loopy and non-loopy ways of building dicts. 


Converting other data types into a dict 


You have several ways to convert other data types into dicts. This section shows the 
dict () constructor, the fromkeys() method, and a for loop. 


Converting a list of tuples 


To convert a list of two-item tuples (or any iterable containing two-item iterables) to a 
dict, pass the list to the dict () method like this: 


pYS lisit of tuples = [(’spem’, 1), ("eggs", "sGreem’) | 
Pee CliGic (list oi cuss) 
{'eggs': 'green', 'spam': 1} 


Converting a sequence 


To create a dict from a list or tuple and an optional mapping value, use the fromkeys () 
method. (This works in Python 2.3 or later.) The items in the sequence become the keys, 
and the mapping value (None by default) is used for the values. In this example, each item 
in the list is turned into a key and given the value 'breakfast': 


SS> iivlaist = [soem , Yeqqs!’, Yorance mice" | 
>>> print dict.fromkeys(mylist, "breakfast") 
{'eggs': 'breakfast', ‘orange juice': 'breakfast', 'spam': 'breakfast"} 
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You can do the same thing by using the dict () constructor and a list comprehension (a 
line of code that creates a list from a range of items). 


SSS chi@ir( (a; Wowesalkitaisit’)) ite al alin imyyllaLSie)) } 

{'eggs': 'breakfast', 'orange juice': 'breakfast', 'spam': 'breakfast'} 

Tip If you won't be working with values later, consider creating a set rather than a dict. A 
set has only keys, no values. To create a set from a list or tuple, just type set () and 
include the list or tuple in parentheses. For more information, see "Setting Them 
Up," later in this chapter. 


To create a dict from a two-element iterable, you can also use a list comprehension. This 
code builds a dict of powers of 2: 


SSS Chbeic (| Gk, Keer 2)) Stew sk alin (2, 4, ©) ])) 
(ozs Ay, a Aue alt Ore w One eS. On) 


You can get the lowdown on list comprehensions in Chapter 16. 
Building a dict with keyword arguments 


In Python 2.3 and later, you can pass keyword arguments (**kwargs) to the dict () 
method and build a dictionary, like this: 


>>> sack = dict(cats=7, kits=7) 
>>> sack 
(RIAKSistetoale Sor eG ce Sou tome a) 


For more about **kwargs, see Chapter 11. 
Building a dict out of another dict by using a for loop 


To go through a dict while building a second one, use a for loop and assign each pair in 
the old dict to the new dict, like this: 


SSS ulyeier = {Vieeel joriee's 'SO0 joomncls!, Visecl ileimeitin’s "2 ieee eine! }} 
>>> newdict = {} 
>>> for key in mydict: 

. value = mydict[key] 

. newdict[key] = value 


>>> newdict 


{ Noecl ileingitin’ 9 2 sree lleineg!, Voeel jornee’s Us00) jsciinels” }} 
TECHNICAL In Python 2.1 and earlier, the in keyword doesn't work with a 
STUFF dictionary object; you need to use it with a dictionary method. So to 


write the above loop, you could use this syntax: 


for key, value in mydict.items(): 
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Updating a dictionary 


To replace the values in a dictionary with new values, use the update () method. The new 
key:value pairs can come from either another dictionary (or other mapping object) or an 
iterable of key-value pairs (such as a list of two- item tuples). 


TECHNICAL The iterable (key:value) option requires Python 2.4 or later. 
STUFF 


You can also replace values by using keyword arguments, as in this example: 


SS> cocis = {/iIZAys Venacummeimi elaine”, 122 Yalouse! } 

>>> rooms.update (A14='hit on the head lessons') 

>>> rooms 

{'12A': ‘argument clinic', 12: ‘abuse', 'Al4': 'hit on the head 
lessons'} 


Here's code that updates one dict by using key:value pairs from a second dict. If a key 
appears in both dicts, the second dict's values overwrite the first dict's values. The second 
dict is not modified. 


>>> rooms2 = {'Al14': 'canceled by Flying Fox'} 

>>> rooms.update (rooms2) 

eee oe OOMS 

{2A es “Varecummeiiic Cllainwe!, 123 Yalouse’, Wilts Vesmeeilecl low iilyaline; wes! } 


When Only a Dict Will Do 


This section gives brief examples of some real-world uses for dicts, including a game that 
retrieves values associated with keys, a cache for storing frequently used items, and a 
way of finding duplicate keys. 


Storing and retrieving values 


A dict is useful for storing and retrieving immutable objects and corresponding values, 
like product prices, for example. Here's a small game that uses a dict for this purpose: 


ChOLCS = Law AnjSuMiE(MCMOOSS SCLESOLS, LOCI, OF Pelseies |) 

wesuiltes = (Vsoissore’s “emic jocjoer, Yeoelk!s YerusinesS Seisis@rs!, 
"paper": "covers 

IOS" }} 


print choice, results[choice] 


Here's one round of the game: 


Choose scissors, rock, or paper: paper 
paper covers rock 
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You can also use if/elif/else statements for this purpose (see Chapter 10), but a 
dictionary is often better because it's built when a program first runs (which is more 
efficient) and it's easier to read if a lot of key:value pairs are involved. It's also easier to 
maintain and modify. 


Using a dict as a cache 


If you're using a for loop to process some items, a dict can store the results for reuse. 
This type of storage is called a cache. There are two ways to write caching code. The one 
you use depends on whether you're processing a lot of different items or mostly multiple 
copies of similar items. 


Tip The following examples refer to a function called get Value (). This isn't a function 
built in to Python; you'll have to write it yourself. 


Different items 


If many unique items probably won't be in the dict, write code like this: 


result = [] 
cache = {} 
for key in mylist: 
if key in cache: 
result.append(cache[key] ) 


else: 
value = getValue (key) 
cache[key] = valu 


result.append (value) 
Tip We use a cache because the get Value () function is expensive (that is, it takes a lot 
of time). However, you should avoid optimizing code when you don't need to—it 
makes code harder to read. 


Similar items 


The following code is efficient if you're processing many similar items, but it's harder to 
read: 


result = [] 

cache = {} 

Oe Ieyy alia) imyllalsies 

if key in cache: 

result.append(cache[key] ) 

else: 
# Add to cache and append to result 
result.append(cache.setdefault (key, getValue (key) ) ) 


Tip If you think that the keys will almost always be in the dict, the following is even 
more efficient. That's because try blocks take very little time if an exception isn't 
raised, but exceptions take a lot of time to process: 
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result = [] 
cache = {} 
for key in mylist: 

eeavas 

result.append(cache[key] ) 

RES OI IMSyildie woe ¢ 
Add to cache and append to result 
result.append(cache.setdefault (key, getValue (key) ) ) 


Dealing with duplicate keys 


Because setdefault () returns the value for a key, it's easy and efficient to write a loop 
that converts a list of key:value tuples into a dict in which each key contains a list of 
values. This is handy when you expect key duplications. 


This code processes a list of tuples and finds that two values share a key: 


SSS l= | (il, Vee tise), (5, Vemilclein aims!) , (il, Veaierienclee)) | 
>>> d = {} 
>>> for key,value in L: 

. d.setdefault (key, []) .append(value) 


SS. il 
{ie ["ereeue iermse!, Yjoeuccicces! |, Ss | Vepolclein seine” 1 } 


Setting Them Up 


Sets are a data type that supports mathematical operations like union, intersection, 
difference, and symmetric difference. 


What sets are and aren't 
Here's how sets fit in with the rest of Python's data types: 


e The set () type is built into Python 2.4 and later. If you're using Python 2.3, you 
must import the sets module before using sets. The data type in the sets module 
is called set () (with a capital S), not set (). 

e Like a dict, a set is an unordered collection of elements. 

e Unlike a dict, a set contains only keys (that is, unique, immutable objects). A set 
does not have values, and it is not a mapping type. 

e Unlike a sequence, the elements in sets aren't stored in any particular order, so 
sets don't support indexing or slicing. 

e In sequences, an element can occur multiple times, but a set contains only unique 
elements, no duplicates. 
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e Regular sets are mutable, and you can't use one as an element of another set or as 
a dictionary key. (But see the section, "Immutable or frozen sets," later in this 
chapter.) 


Here's an example of a set based on a list that contains some duplicate elements. The set 
contains no duplicates. 


>>> basket ['apple', 'orange', '‘apple', '‘pear', 'orange', 'banana'] 
>>> fruits = set (basket) 

>>> fruits 

set(['orange', 'pear', 'apple', 'banana']) 


Membership testing with sets 


A set is useful for comparing individual items against a group of elements. For example, 
if you have a set that contains all states on the Atlantic coast, you can check whether a 
user-entered state is on the Atlantic. 


Checking elements 


To test whether an element is in a set, use the in keyword, like this: 


>>> 'Maine' in atlantic states 
True a 

PeS VOwegem’ ain Gelameie Siraces 
False 


Checking subsets and supersets 


To test whether one set is a subset or superset of another, use the appropriate method 
listed below. Both return True or False. 


e a.issubset (b): Tests whether all the elements of set a are also in set b. 
e a.issuperset (b): Tests whether set a contains all the elements in set b. 


Finding set elements 


Set operators and methods can be used to combine sets and find their intersections and 
their unique elements. 


TECHNICAL Set operations have corresponding methods. In some cases, using the 
STUFF method syntax rather than the operator creates easier-to-read code. 


The following operations are the most commonly used. 
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Difference 


The difference operation compares two sets and returns a set containing the elements that 
are in the first set but not in the second set. The difference operator - uses this syntax: 


el = 19) 
The difference method uses this syntax: 


a.difference (b) 


The following example finds the set of round things that aren't pink: 


SSS icelacltianines = || enesince!’, Venues’, Veco", VVeiies! | 
SS> joaliahecloalines = [[Vilsjosicikelk!, Vejuimloatli’, Vioeilchy lolleunlesic” | 
>>> a = set (roundthings) 


>>> q = set (pinkthings) 
SSS 2 = C6 
set(['orange', 'egg', 'tire']) 


Union 


The | operator returns the union of the sets (all the elements from both sets). The union 
method uses this syntax: 


a.union (b) 


This example finds the set of all round things and all pink things: 


SSS a || ef 
set(['lipstick', 'tire', 'orange', 'gumball', '‘egg', ‘baby blanket']) 


Intersection 


The « operator returns the intersection of the sets (the elements that are present in both 
sets). The intersection method uses this syntax: 


a.intersection (b) 


This example finds all the round pink things: 


>>> a & q 
set (['gumball']) 


Symmetric difference 


The * operator returns the symmetric difference of the sets—elements in 


144 


e Exactly one set 
e Not both sets 


The symmetric difference method uses this syntax: 
ALS yMMISwLS ChiriiSwSmcsS (5) 


This example finds all the things that are either round or pink but not both: 


SSS a ~ CE 
sec (| Voweinge!, Vece", Vilijosicielk!, “ies! , Viosloyy lolleiaieie || )} 


Immutable or frozen sets 


Python 2.4 has a built-in data type called frozenset. It's similar to set, but it's 
immutable (that is, after you've created one, you can't change it). Because a frozenset is 
immutable, it can be used as a dict key or as an element in a set. 


In the following example, a frozenset instance is used to associate a three-item set key 
with the value None: 


SSS mylliisit = [Ytoe!, Vee’, 'sjoeim’ | 
>>> s = frozenset (mylist) 
SSS 


LeOw@eMseic (| Vi@G!, Vos’, ‘soem’ | } 

>>> d = {s: None} 

SSS 

{frozenset(['foo', 'bar', 'spam']): None} 


If you try to use a regular set as a dict key, Python builds an immutable set from your set 
and uses that set as the dict key instead. 


TECHNICAL Python 2.3 has a similar subclass of sets called ImmutableSet. It is 
STUFF available if you import the sets module. 
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Part II: Structures 


Chapter List 


Chapter 10: Staying in Control 

Chapter 11: Fun with Functions 

Chapter 12: Building Applications with Modules and Packages 
Chapter 13: Getting Classy 

Chapter 14: Introducing New-Style Classes 

Chapter 15: Feeling Exceptional 

Chapter 16: Tackling Some Advanced Features 


The 5" Wave By Rich Tennant 


"We're here to clean the code." 
In this part... 


This part covers Python's code blocks, the chunks of code that manipulate data and return 
results. 


If you've ever seen a flow chart, you'll be comfortable with Chapter 10, which describes 
if statements and loops... loops... loops.... 
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Chapter 11 takes you inside the magic boxes called functions. You'll discover how to put 
stuff into them and how to get stuff out of them. 


Modules and packages are the Lego blocks of Python programs. Chapter 12 shows you 
how to build and work with both module files, which store chunks of code that work 
together, and packages, which are directories that contain related modules. 


Lucky Chapter 13 inducts you into the temple of object-oriented programming. Classes, 
like factories or templates, create other objects that do the actual work. After you read 
this chapter, you'll understand what it's like to be a capitalist baron who controls the 
means of production. 


Classes are such an important concept that we couldn't get away with just one chapter 
about them. Python 2.5 introduced new-style classes. You don't have to use them, but if 
you want to, Chapter 14 is where they are. 


What do you do when something weird happens in your Python program? Chapter 15 
offers advice. Here, you find out how to anticipate and handle exceptions in your 
programs. 


If you've devoured the rest of the chapters in this part and you're still hungry, turn to 
Chapter 16. Here, we introduce some advanced features of Python programming. Even if 
you don't want to use these features yourself yet, you'll appreciate being able to recognize 
them in other people's programs. 
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Chapter 10: Staying in Control 


Overview 


With Python, you use control structures to determine whether a chunk of code runs or 
how many times to run that code. 


TECHNICAL This kind of code is also called a compound statement. A compound 
STUFF statement is a kind of code block (lines of code that perform an 
action). 


In this chapter, we show you 


e The condition and comparison operators that control structures use 

e Tests you can do on data to decide whether and how code should run 

e How to use if blocks to test several specific conditions 

e When and how to use for and while loops to execute code repeatedly 


hings to Know about Control Structures 


This section describes some general things you need to know about how control 
structures work. It applies to all the structures in this chapter. 


e Acontrol structure starts with a keyword—if, for, try, while, or with. The first 
line also contains an expression and ends with a colon. 

e You can nest control structures. So an if statement can be nested inside a for 
loop, and vice versa. (You can also nest for in for and if in if.) 

e Python uses indentation to group statements together. 


Each line within a code block must be indented by the same amount—the 
convention is four spaces for the first block, eight spaces for any block nested in 
the first block, and so on. 


e When you enter a compound statement (a statement with more than one line) in 


interactive mode (see Chapter 2 for instructions), you need to end it with a blank 
line (a Return). This tells the parser that you have typed the last line of the block. 
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All about Conditions and Comparisons 


Control structures use conditions and comparisons to tell Python when to carry out 
actions. A condition tests whether some expression is true or false. A comparison tests 
whether one value is larger than, equal to, or smaller than another. This section covers the 
truth values of objects and the operators that you use for conditions and comparisons. 


The value of truth 
In Python, every object has a truth value—a Boolean value of true or false. 


REMEMBER In Python, true and false don't work quite the same way they do in the rest 
of life. For example, any string except an empty string evaluates as true. 
So you could tell Python 'I am a billionaire', and it would be true! 
True and False aren't true and false 


Python 2.2.1 and later have built-in Boolean data types True and False, which are 
associated with the bool () built-in function. (Other functions and methods also return 
Boolean results, such as isdigit ().) Comparison expressions, such as x < y, also return 
True and False. 


History geeks may be interested to know that previous versions of Python used 1 and 0 
rather than True and False because that's how it works in the C programming language. 
The odd result of this is that the True and False data types are integers: 


SSS il s+ Wene 
2 


It's important to remember that the Boolean data types are not the same thing as the truth 
values of objects returned by Boolean operators. When you check the truth value of 
objects by using Boolean operators (for example, by using x and y), you are checking 
whether the objects are empty. The Boolean operators return a Python object (for 
example, in x and y, the object x or the object y is returned); they don't necessarily 
returm True OF False. 


False objects 


All Python objects evaluate as true except for null, empty, and zero objects. The following 
objects evaluate as false: 


¢ False (one of the two Boolean objects) 
e Zero (0) (for any numerical data type) 
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e¢ Empty string '' 

e Empty list [] 

e¢ Empty tuple () 

e Empty dictionary {} 

e Empty set set () 

¢ None (a special Python object that stands for "no value") 


Testing an object for truth 
Testing whether an object is true or false is simple. Just use an if block: 


1. Type it followed by a literal or the name of an object. Then type a colon and 
press Return: 

Mao aig mallet ge 

3. Type four spaces, and then type something for Python to do. Press Return 


twice when you're finished. 
5 See alge Weis 
5 joiealine Myagiy 


HHO oO 8 


. yes! 


Python performs the action only when the object evaluates as true. This test does not 
assume that the object is any particular data type. Because you're not comparing the 
object against anything else, you can think of it as "letting the object evaluate itself." 


TECHNICAL it Mame Of GloieCrs 1S ShOKt HOw Li loO@l (meme OE _GlojECE) 
STUFF caeebiaal 


Testing the elements of a sequence object 


In Python 2.5, new built-in functions allow you to test the truth values of multiple 
elements of a sequence object (such as a list) or other iterables at once. 


allQ) 


The al1() function returns True if all the elements are true. It also returns True if the 
sequence object is empty. For example: 


Soh uivylliset = fil, 2, 3] 

SS SLL (my llatsiz)) 

True 

SS> mlyotlnewilast = (0, 2, 3] 
>>> all (myotherlist) 

False 

>>> myemptylist = [] 

>>> all(myemptylist) 

True 


150 


any(Q) 


The any () function returns True if at least one element is true (the object must not be 
empty). For example: 


>>> any (mylist) 

True 

>>> any (myotherlist) 
True 
>>> any (myemptylist) 
False 


Boolean operators 
You use Boolean operators to test the truth values of objects (to test whether objects are 
empty). You can test more than one object at once, and you can set up conditions that 


depend on the truth value of one object or the truth value of all the objects you're testing. 


REMEMBER The Boolean operators are evaluated in this priority order: 


1. not 
and 
3. or 


The operators and and or are short-circuit operators. Python evaluates expressions with 
these operators from left to right and stops as soon as the outcome is determined. Python 
returns the object it evaluated last and skips the rest of the expression. 


All about or 


The or operator stops evaluating as soon as it finds something that's true. In the following 
example, the first item is true, so processing stops there and Python returns the first value. 


Sse a oi 4 or 5 
3 


In the next example, the first item is false, but the second item is true, so processing stops 
and Python returns the second value: 


SS> 0 o@ iL oi 2 
1 


If no item evaluates as true, the last (false) item is returned. 
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Embrace the power of and 


The and operator requires ail the conditions to be met for the expression to be true. 
Evaluation stops either when an object tests false or when all items have been tested. 


Tip The following examples show the results of two expressions using and: 


e This example tests all the items and returns the last value: 

e >>> 3 and 4 and 5 

ee 

e This example stops after testing the second item (0) because it tests false: 
Oo 6S 3 ame © ame 5 

e 60 


Comparison operators 


Comparison operators test whether something is equal to and/or greater than or less than 
something else. Table 10-1 lists the comparison operators. 


Table 10-1: Python's Comparison Operators 
=™>Open table as spreadsheet 


Operator Usage Meaning 

S x < y x Is less than y 

s= See x 1s less than or equal to y 

P aie x 1s greater than y 

C= x >= y x 1S greater than or equal to y 
l=, <> x != y,x <> y xis not equal to y 

r a 4 x is equal to y 


REMEMBER All comparison operators have the same priority. Statements with multiple 
comparison operators are evaluated from left to right. 


The difference between equals and equals equals 

In Python, = (the assignment operator) gives a name to an object. The = operator is part of 
a statement. If you want to test whether something is equal to something else, you must 
use the == operator. 


Is you is or is you ain't my baby? 


There are two special comparisons: is and is not. These don't test the values of x and y; 
they test the identities of x and y—that is, whether x and y are bound to the same object. 


Warning To avoid problems, make a habit of using == and != to compare objects' values 
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rather than using is and is not to compare their identities. Usually, you use is 
and is not only when you want to test whether something has the special value 
None. (There is only one None.) 


Get in! 


The comparison operators in and not in check whether a value occurs in a sequence 
data type (a list, tuple, or string), a dict, or a set. 


The following example checks for the text 'fox' ina set: 


SS> claire = seie( eae, Memmnek", Moroni", seoxx' || }) 
SSS slic Witep! cid Clencers 
~ jOiedine “Cenlil omic jelne Inouiaels ! 4 


Call Ours eine lhowimels ! 
Tip If your program works with lots of data, it's a Good Idea to use dicts or sets to store 
the data. Searching dicts and sets is faster than searching sequences. 


Comparatively speaking 


Comparisons can be linked in a chain. The follow example tests three values. Both tests 
must be true for the comparison to be true. 


SSh at Tx 2 < 38 
=, jOi@aliane Wyeast 


yes! 


TECHNICAL Linked comparisons can be hard to read. In some cases, combining 
STUFF comparisons with Boolean operators (and, or, not) and parentheses 


will improve readability of your code. Compare these pairs of 
expressions and decide which are easier to read: 


it a < ly eiacl Io < © 

nbier wey << Joy << Iele 

ie el < lj S CO < els 

ice (a < Io) a@inel (lo > €) eincl (Ee «< el): 


The Boolean operators have lower priorities than comparison operators—that is, they are 
evaluated after comparison operators. You can use parentheses to make sure comparisons 
are evaluated in the order you want. 


Feeling Iffy 


An if statement is an instruction that Python carries out only if a condition is true. You 
often use a comparison operator to test the condition in an if statement. The condition 
must be either true or false (no gray areas here!). This is referred to as Boolean 
comparison. 


ies) 


Tip You can translate an if statement into English as follows: "If this is true, then carry 
out the instructions below." 


Writing an if statement 
To write an if statement, follow these steps: 


Type if, a condition, and a colon, like so: 

if weather == "raining": 

On the next line, type an instruction about what to do if the condition is true. 
if weather == "raining": 

bizimeg wilozelila = “wes 


Cir OW is — 


Be sure to indent this line (and any other lines of the block) four spaces. 


Warning You might see single-line code blocks combined on one line like this: 


iit weeither == Viegiimiime'"s lorie umloreilila = “yes! 


We don't recommend using this style in your own code. It's harder to read, and probably 
won't be allowed in Python 3.0 and later. 


Adding another condition to an if block 


If you want to test several conditions, you can include additional code blocks to an if 
statement. These blocks begin with the keyword e1if (which is short for "else if"). The 
elif statement says, essentially, "Otherwise, if this other thing is true, then carry out this 
other instruction." 


Warning Crucial code that always has to run doesn't belong in if or elif code. Python 
reads if blocks (if and e1if statements) in order. As soon as it finds a true 
condition, it stops evaluating. The rest of the block is not executed. 


To write an if block that includes an e1if statement, follow these steps: 
1. Write an if statement. 


Tip Use the instructions in the "Writing an if statement" section, earlier in this 
chapter. Your code might look something like this: 


if weather == "raining": 
jicime wiloicelila = “wos” 


2. Without indenting, on the next line, type elif, a condition, and a colon (:). 
3. if weather == "raining": 
4, lozime wnilorellea = yes 

5) lif weather == "windy": 
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Tip If you are typing in a program that automatically formats Python code, you 
might have to press Backspace or Delete one or more times to make sure that 
this line is not indented. This is called dedenting. 

On the next line, indent four spaces and type an instruction, like so: 


if weather == "raining": 

picling tmilowella = “wes 
lif weather == "windy": 

icing jacwecn = wees 


You can type more instructions. Be sure to indent any additional instruction lines 
four spaces. 


Adding an else statement to an if block 


An else statement always comes last in an if block. It means "If none of the preceding 
statements was true, then carry out this final instruction." 


Warning Don't put critical code in an else statement. The else code gets executed only 


if none of the other statements in the if block is true. 


To write an if block that includes an else statement, follow these steps: 


1. 


2: 


Write an if statement. 

Tip If you have only one condition to test, skip to Step 3. 

If you have additional conditions to test, write one or more c1if statements. 
Use the instructions in the section "Adding another condition to an if block," 
earlier in this chapter. 

On the next line, type else:. 


This text should line up with if. You might have to dedent if you're typing in a 
program that formats Python code automatically. 


Go to the next line, indent four spaces, and type at least one instruction. 


> GO Sulic = “new! Wineie eee you Wealtcime ieee 


Be sure to indent any additional instruction lines four spaces. 


Your completed code block might look like this: 


if weather == "raining": 
sieiing wiiloieelile = “wes” 
lif weather == "windy": 
Ssizilimg Jacker = “wes 
elses 
go_out = "now! what are you waiting for?" 


ice, 


Combining tests 


You don't always have to write elif statements when you want to test multiple 
conditions. If you want to perform the same action in more than one condition, you can 
combine the conditions in a single if or elif statement. 


The code snippets in Listings 10-1 and 10-2 both test for the same conditions, but the 
example combining multiple conditions in the if statement is easier to read: 


Listing 10-1: elif block 
ie sige < iis 
jeanne “MDa Seoulinc ieee” 
elif age > 65: 
print "Discount rate" 
else: 

print "Adult rate" 


Listing 10-2: Two conditions in an if statement 


if (age < 18) or (age > 65): 
print "Discount rate" 
else: 
jorealinie “Aohulie seetieS” 


Staying in the Loop 
Computers are good at doing things over and over again. Humans tend to find repetitious 
tasks boring. That's why we program with /oops. Loops are code blocks that tirelessly 


repeat until the task is finished. Python has for loops and while loops. Both kinds of 
loops have these things in common: 


e They start with a keyword (for or while), followed by an expression and ending 
with a colon. 


e They are code blocks, and the code inside them is indented. 


TECHNICAL The right loop for your task depends on several factors. See the 
STUFF section "Choosing Your Loop" later in this chapter. 


For a Good Time... 
A for loop is a way to repeat some instructions (a code block) a number of times. In this 
section, we explain the workings of the for loop and discuss iterables, which are objects 


you can use to generate a for loop. 


How for works 


Here's a basic for loop: 
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Or myCchamy dene nedelo st: 
print "the ascii value of", mychar, "is", ord(mychar) 


The preceding example shows all the important features of a for loop: 


e The keyword for 

e Aname that holds a value each time through the loop (this name is sometimes 
called the /oop target) 

e The keyword in 
TECHNICAL Note that in used in a for loop works differently from the 


STUFF Boolean operator in (see the "Get in!" section earlier in this 
chapter). Here, in is used to separate the loop target from the 
iterable. 

e An iterable (a sequence or iterator object)—in this case, the string 'hello!' 
¢ Acolon 


e An indented statement 


In a for block, Python gives the name to each element in the iterable, one at a time, in 
order. It reuses the name each time through the loop. After Python gives the name to a 
new element, it executes the instructions in the code block. 


The example at the beginning of this section computes and prints the ASCII value of each 
character in a string (using the ord() function): 


the ascii value of h is 104 
the ascii value of e is 101 
the ascii value of 1 is 108 
the ascii value of 1 is 108 
the ascii value of o is 111 
Ee ascisd yalue of |! as 33 


Iterables, getcher iterables! 


A for loop works only with objects that have, or can generate, multiple elements. Such 
objects are called iterables (because a for loop generates multiple iterations, or repeats, 
using the elements in the object). 


TECHNICAL Manipulating an object with a for loop is also called iterating over a 
STUFF sequence or looping through a sequence. 


Here are some of Python's objects that can be used with a for loop: 


E lement in range(1,4): # range 
rint element 


e 

Pp 

element in [1, 2, 3]: # list 
print element 
k 
Pp 
1 


For Kkexy atin {Veme’ sil, Vewe"s2)}¢ 4 cli@ilomaiey 
rint key 


ine in open("myfile.txt"): # text file 


la? 


print line 


for value in mydict.itervalues(): # iterator object 
print value 
for key, value in mydict.iteritems(): # tuple unpacking 


print key, value 
Home on the range 


It's common to use a for loop to iterate over a sequence of numbers by using the function 
range (), which generates lists of arithmetic progressions. 


To use the range () function, feed it the following arguments: 
e Required: An integer specifying the end of the range 


REMEMBER The range () function generates a list of numbers up to but not 
including the end-of-range number. For example, if you want a list 
of all the single digits, you would use 10, not 9, as the end of the 
range: 


>>> range (10) 
IO, te 2p 3p Be Se Op Ve Bo 9 


e Optional: An integer specifying the start of the range. It goes before the integer 
that specifies the end of the range. If left out, it defaults to 0. 


This example shows a range () function with a starting number of 2. 


>>> range(2, 10) 
[2p 3p 47 Sy Gp Ty By YI 
Optional: An integer to add to the current number 


e Optional: An integer to add to the current number to generate the next number. 
(This is called a step.) If left out, it defaults to 1. 


If you include the step, you must include the start-of-range number. 


This example shows a range () function with a step of 3. 


>>> range(0, 10, 3) 
[07 37 O, BI 
Tip You can use range () with negative numbers, too, as shown here: 


SSS emcGe(—10, ~LOO, =30) 
[=10, =40, =—70] 
Howdy, xrange-r! 
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The xrange() function is similar to range (), but instead of returning a list, it returns an 
iterator object that generates the numbers in the range. 


Loops using xrange () are a bit faster and use less memory. 


You can see Python's internal difference between range () and xrange() by comparing 
the following. Note that Python converts the xrange() object to an internal "equivalent": 


>>> range(0, 10, 3) 
[O, 3, G, DB 

>>> xrange(0, 10, 3) 
xrange(0, 12, 3) 


Using the range() function with a list 


To iterate over the index numbers of a sequence (such as a list), use a for statement with 
range() and len() (which finds the number of elements in a sequence), like this: 


foe 1 im camge (lem (lisic meme) )) 3 


The enumerate () function, new in Python 2.3, works similarly. See the "Numbered and 
approved" section, later in this chapter, to find out more. 


Warning It isn't safe to modify the list you're iterating over! To duplicate selected items 
or otherwise modify items in a list you're using in a loop, either 


e Use awhile loop (see the next section). 

e Create a new list to store the modifications, as shown in this example: 
O S2> iam imem st ives = ["wayes!’, ‘sacks’, "Gars’, Vkiies” | 
e >>> newlist = [] 

O S22) fOr i im Mem trom Sitt_aweses 

e soo ives ae [yy ah] 


e >>> newlist 
e [ee Swaviciste 7 caicko a age Veakse ale Uicircn | 


Whiling Away 


A while loop performs an action an indefinite number of times, as long as a condition is 
true. 


Warning If the condition is always true, the while loop never stops (unless a return or 
break statement is encountered, or an exception is raised). 


To write a while loop, follow these steps: 
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1. Type while, a Boolean expression, and a colon (: ). 


Here's an example: 


Wide mimloeie OF looccles > 03 


2. On the next line, type four spaces and then type the action you want 


repeated, like this: 
35 While mumlser ot loortles > Oz 
4, DEIME muMosie OF botcles, “Isocrtles Of leer om tine welll 


If you add more lines, be sure to indent them. 


Tip A while loop is useful in a program that waits for user input. The program in Listing 
10-3 reports on the results of several "coin tosses." The while loop checks whether 
the user has asked to quit the program. 


Listing 10-3: Coin toss program 


import random 


headcount = tailcount = 0 

userinput = ''! 

jociime “NOW COSSiLiMC) 2 Coli, 59" 

while userinput.lower() != "q": 

Filia = eincloim,cmoieS (|| Yineacs", Weaidlls\ |) 

if flip == 'heads': 

headcount += 1 

print "heads! the number of heads is now %d" % headcount 
else: 

jeeuLikeeiwinie a= i 


fo) 


print "tails! the number of tails is now %d" % tailcount 
juciimt “Press Ye’ tte craic’, 

USSwLMUE = Kaky_alinjewic(YOr amMoOtlNer key tO toss agelms'’) 
print "the total number of heads:", headcount 

jpcime “Nte Coral mumloer or tadilse", taileoumir 
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Choosing Your Loop 


Python has two kinds of loops: while loops and for loops. In general you should use a for 
loop, but there are a few specific situations where a while loop is better. This section 
explains when each type works best. 


What for's for 
Python emphasizes for loops, which are very flexible and powerful. 


REMEMBER Most of the time, when you want to repeat a block of code more than once, 
you should use a for loop. Use for loops in the following situations: 


e¢ When you want to iterate through a sequence one item at a 
time 
e When you want to repeat a loop a certain number of times 


e¢ When you're reading a text file 
REMEMBER A for statement has two parts: 


e An iterable, which provides elements one at a time 


In this for statement, the iterable is range (10): 


SSS Oise a im weunge (0) ¢ 


e A loop target (a name), which is given to each value that the 
iterator generates 


In the above for statement, the loop target is i. 


The for loop is so flexible because there are many ways to use the loop target name in 
the body of the loop. For example, you can give a new value to the name inside the loop 
(as long as you don't mind if that value goes away when the loop restarts). This code 
gives a new value to i: 


SSS i@ie a iin ieeinee) (5) $ 
Swe? ak, Be Sie 
. i = "more spam!" 
.. else: 
5 a = Wig ileeiiacyla | Y 
= jeuealioie al, 


more spam! more spam! more spam! bleargh! bleargh! 
Warning The other side of the coin is that if you give a new value to the loop target, you 
can't expect the value to stick. This code doesn't print what you might expect, 
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because i gets reassigned to a number when the loop restarts: 


>>> for i in range(5): 


o jOewiMie a, 
Nise nal. Sen She 
i = "more spam!" 
else: 
i = "bleargh!" 
OP a ee 
Why to while 


The while loop is useful for these situations: 


e Event-driven programs: Many programs wait for a user to take an action (such 
as clicking a mouse button), and then respond to the action. Often, a main while 
loop collects user actions (events) and sends them to chunks of code that act on 
particular events (event handlers). 

¢ Loops that need to run an undetermined number of times: If you don't know 
how much data you need to process—for example, when you're reading a binary 
file—you might use a while loop that repeats its action until it detects that there 
is no more data. 

e Processing a container object while modifying it: The spider.py program in 
Chapter 4 includes this kind of while block. The block uses the pop () list method 
to removes items from the links to process list. A while loop is used because 
no matter how long links _to_ process 1s, it will run as long as there are items 
left in the list. However, it's important that the while block removes items from 
the list. If it didn't, the code would run forever. 


O walle s@llir, limes te processs 

O wel = séli, links co process pop () 
e 6©self.log("Retrieving: " + url) 

QO SEINE PIKOCSSS| OearerS (well) 


The while loop, because it includes a Boolean expression, can stop unexpectedly if it 
encounters data (such as None) that evaluates as false. This while loop stops in the 
middle of processing the list because data is given the value None, which evaluates as 
false: 


>>> def process (a): 
5 jOIeLMIE ely, 


SSS imvyllisc = fil, 2, Nos, 4, 5] 
>>> data = mylist.pop() 
>>> while data: 
. process (data) 
data = mylist.pop() 
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The correct way to process items from a list in a while loop is as follows: 


SSS umllisic = [l, 2, None, 4, S] 
>>> while mylist: 
. process (mylist.pop()) 


Sy “4 Nome 2 ib 


Loopy Statements and Functions 


This section describes some special statements that work with loops. It also details 
several useful functions that work with loops. Some of these statements and functions 
also work with other code blocks. 


Useful looping statements 


Several statements help you get the most out of loops. Each of the following statements 
works as part of a loop code block and goes on a line by itself. 


Take a break—or else! 


The keyword break stops the processing of a loop and exits the loop. Any code in the 
block that comes after the break statement is ignored (any else clause is also skipped). 
If the loop is nested inside another code block, the program goes back to the block that 
the loop was nested in. 


In a loop, the break is usually part of a nested if statement. In the following example, 
break exits the if loop when the element has the value 'n'. 


SSS yy = Vawsicicaliag” 
SSS iO al iid ws 
> abies al St ier 
5 joiesioe MVoreoes fv 
. break 
= jeueajinie al, 
Sejohgakane | Ue 
By ge ie ae al Joes | 


You and who else? 
Loops, like if statements, can have an else clause that executes when the condition in 
the loop evaluates as false. One way to use the else clause is with a nested if sub-block 


that has a break statement in it. It works like this: 


1. Each time around the loop, the condition in the if statement is tested. 
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2. Ifthe condition is true, the break statement causes everything else in the loop 

(including the else block) to be skipped. Control exits the loop. 

If the condition is false, the loop starts over again. 

4. Ifthe loop iterator runs out before the if condition evaluates as true, the for 
loop's else clause gets executed. 


1S) 


The following code determines whether a string (in this case, 'larch') contains any 
lowercase vowels: 


min Sicicasliate; =  Iewecila™ 

vowels = set('aeiou') 

for a in mystring: 

if a in vowels: 

print mystring, "has a vowel" 
break 
Silses 
print mystring, "does not have a vowel" 


The result is: 


larch has a vowel 
Let's continue 


When Python sees a continue statement in the middle of a loop code block, it skips the 
rest of the code in the block and goes back to the top of the loop. Here's how it works. 
This code block prints both "5" and "five" because the print n, statement is executed 
each time through the loop. 


SSS Or mM wist weimee (110) 2 


if n == 
5 joes “irae”, 
5 JOINS in, 


GLAS 2 tive 5 6 7 2 


When we add continue to the if block, Python skips the last print n, statement when 
the loop condition evaluates as true; therefore, it prints only "five": 


SSS ioe in ali iene (i110) 3 


aie ia) = 

5 joueslime “itawye , 
continue 

+ (OLMIE i, 


Qs 2s Ss Aetive 1607) 399 
Tip Use continue statements sparingly because they can make your code hard to read, 
especially if they are nested in complex structures. 
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Thanks, but I'll pass 


The pass keyword tells the Python interpreter to do nothing. It is usually used in class 
statements, but you'll also find it useful in while or for loops when the program requires 
no action. In this example, the argument_clinic() function does nothing if its argument 
is greater than or equal to 5: 


SSS Clit euccimemic Cllimie (immEes) g 
. 1f minutes >= 5: 


so Pass 
. else: 
jouealinie MiN@y atic. alia ie)” 
>>> minutes = 4 


>>> argument clinic (minutes) 

ING ake aisia ie! 

>>> minutes = 5 

>>> argument _clinic(minutes) # nothing happens 

ee 

Tip The pass statement is also useful when you are starting to build a program. You can 
create the skeleton of the program's code blocks by using pass and then fill in what 


the blocks do later. (But don't forget to take out pass!) 


This example code will eventually solve all the world's problems, but for now it's just a 
placeholder: 


def grand_solution (problems) : 

mum solve all the world's problems here """ 
# We will figure this out someday 

pass 


Loopy functions 


This section describes some functions that are especially useful when working with 
loops. 


TECHNICAL All the functions described in this section work with iterables in 
STUFF general, not just with sequences. 


Numbered and approved 


In Python 2.3 and later, use the enumerate () function in a loop to return the index 
(position) and value of each element. Follow these steps: 


1. Type the following, substituting your own iterable for the example list: 
2 deene al, Wr alin, Slotwinsscencs (Vina, Vice’, Vicrexs! I) )} 2 


3. On the next line, indent four spaces and type some code that you want 
Python to execute. 
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ay, iE@Ne aL, AW akin, Glowwiiescence (Vine, “cael, Meee ]))) 2 
Bie joeilione ak, W 


Here's the result of the code: 


@ tie 

i iar 

2 ic@e 
TECHNICAL Technically speaking, enumerate () returns an iterator that creates an 
STUFF (index, value) tuple for each data item. The tuples generated by the 


preceding code are (0, 'tic'), (1, 'tac'), (2, 'toe'). 
The two-loop tango 
The zip() function, new in Python 2.0, loops over two or more sequences at the same 
time. It returns a list of tuples in which an item in each sequence is paired with the 
corresponding items in the other sequences. 


Tip Think of a zipper zipping together two sets of teeth! 


To write a for statement that zips two sequences, type the following (substituting your 
own sequence names): 


HOw 2, 19 im Zio (SSCGuUeENeS OMe, SEEMEMES Ene) 3 


The following example uses the zip() function in a for statement to explain the rules of 
the game "rock, paper, scissors": 


SSS willis = | Veeelk!, Vosjoee’, Vseissonrs" | 

SS uivylise2 = | isciss@ere!, Umo@elk!’, “oajoeie’ 

SoS uylliisics = | seer), Vsoussoms), Ymeelc! | 

SSS iri 2, lo, © aia wijo(mmylisicil, mylisic2, imyllatsicS) ¢ 
» joie Vas loseics msi lowe ino as" | (a,lo;e) 


rock beats scissors but not paper 
paper beats rock but not scissors 
scissors beats paper but not rock 


Gentlemen and ladies, unpack your tuples 

A for loop is a convenient way to unpack the values in a tuple. A canonical example of 

tuple unpacking uses the iteritems() method of dictionaries, which first turns each 

key:value pair into a two-item tuple. 

Tip We use items () in this example to show how the transformation to a list of two-item 
tuples takes place. In actual code you should use the iteritems() method instead. 


The result is the same, but it's more efficient. 


This example shows how unpacking works: 
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Soe Cheese imvemeony = (INecl Walaclsor’s Vitine wan lomoke clowm" , 
"Caerphilly': 'on order', 'Red Leicester': 'fresh out'} 

>>> cheese inventory.items () 

[('Red Windsor', 'the van broke down'), ('Caerphilly', ‘on 
order'), ('Red Leicester', 'fresh out')] 


You can use the following for loop with the iteritems() method to unpack the tuple 
elements: 


PY ior key, walue im Cheese imwemcory. LieSiciicems (() g 
print key, ":", value 


Red Windsor : the van broke down 


Caerphilly : on order 
Red Leicester : fresh out 


Going backward 


The reversed() function returns an iterator that goes through the sequence items 
backward. The reversed() function is new in Python 2.4. 


The following example uses reversed() with the range () function as its argument to 
count down through the sequence: 


for f in reversed(range(1, 11)): 
HONE ATONE GE 
poueaione “loylLensiic@ve sr | 


Here's what it prints: 


10987654321 blastoff! 
Out of sorts 


The sorted() function returns a new sorted list while leaving the source list unchanged. 


It works with both lists and immutable sequence objects. This function is new in Python 
2.4. 


SSS mydlaise = fil, 10; 2, 9 34 8, 2 Ir Se GI 
>>> mynewlist = sorted(mylist) 

>>> mynewlist 
[ly 2, 3p BS By Op Te Be 9, LO] 

>>> mylist # original list is unchanged 

[dy IO, 2, 9, 3, 8, 42, Tr By Sl 

TECHNICAL In contrast, the list method sort () changes the original list in place. 
STUFF 
SS> myllilsic. sowie () 

SSS mh Ilse 

[l, 2p 3p By By Ge Te Bp 9, LO] 
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To use the sorted() function in a for statement, type the following (substituting the 
name of your sequence object): 


SSS mms = (YeinY, VettiencemY, Vor, Vemuccoseia!, Yengedlilow'!, 2) 
>>> for f in sorted(mytuple): 
5 jOiaLiMe IE,, 


2? african an european or swallow 


Chapter 11: Fun with Functions 


A function is a block of code that performs a specific action and returns a result. 
In this chapter, you find out about calling functions, writing functions and their 


docstrings, and passing arguments to functions. You're also inducted into the arcane 
mysteries of the namespace. 


I Love Chunky Code 


Here's how functions make your life easier as a programmer and make things easier on 
people who read and use your code, too: 


e Functions let you write code only once. 
When you create code that you want to reuse, be sure to define it as a function. 
This saves you from typing the same code over again when you want to use it in 
another program. 

e Functions hide unnecessary complexity from the user. 
No one ever has to care what this function does unless it's included in your 
program. And even if someone does use the function, they don't have to care how 
it works, as long as it performs the way its documentation says it will. 

e Functions make your code easier to understand. 
If you name a function according to what its code does, then someone reading the 
program just has to look at the name to get an understanding of what the program 
is doing at that point. (Of course, you should also include comments in your 
program and comments and docstrings in your function to further explain what's 


happening.) 


e¢ Functions help you organize your program logically. 


168 


Writing most of your code chunks in functions helps you organize your program's 
structure logically, and that can make your programs easier to understand. 


Calling a function 


Using a function is known as calling the function. When you call a function, the 
function's code runs and returns a result. 


To call a function, follow these steps: 
1. Type a name to hold the result of the function followed by an equals sign (=). 


You don't have to do this, but usually you want to do something with the function 
result, and it's easier if a name has been given to the result. 


2. Type the name of the function. 
3. In parentheses, give the function information about what you want it to act 
on. 


Giving a function information is called passing arguments. Some functions don't 
take any arguments, but most do. 


You can pass the function a value or a name. 


In the following example, the sum() function is given one argument: the name of a list 
(purchases). The name total is given to the result: 


Parameters versus arguments 


In this chapter, we use the terms parameter and argument when discussing the data that a 
function needs to do its work. Here's a way to understand the difference: 


A parameter is a placeholder for the data. When you are writing a function, you use 
parameters to tell the function to expect data to be passed. Then you use the parameter 
name inside the function to stand for that data. 


An argumentis the actual data that's passed when you call the function. Here's an 
example: 


>>> def a_function(a_param) : 
print "my parameter is a_param 
and my argument is", 
repr (a_param) 


>>> a_function("my arg") 


my parameter is a_param and my argument 
is "my arg" 
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TECHNICAL Functions versus methods 

STUFF 
What's the difference between a function and a method? Not much. A 
method is a kind of function—both perform a specific action on some 
information you give them. However, a method is associated with a 
particular object and performs its action on that object. Often, that 
object is a data type. For example, a list comes with the append () 
method, which lets you add items to the end of the list. Whenever 
you create a new list object by using a statement such as my list = 
[1,2,3], the object my list has access to all the methods for list 
objects. 


Functions, on the other hand, stand on their own. You don't have to 
create an object in order to use one. 


You call functions and methods using different syntax: 


¢ To use a function, type the function name: 

e my function (x) 

e To use a method, append it to the name of the object, 
separating the name and the method with a dot, like this: 

e my list.append(x) 


SS> pueciases = (15.02, 2.99, 4-45] 
>>> total = sum(purchases) 

SSS joriatc tora 

20.46 


Defining a Function 


Unless you're using a built-in function, you need to define the function before you can 
call it. These are the basic steps for writing a function: 


1. On the first line, write the def statement for the function. 
Follow these steps: 
a. Write def and the function name. 
b. Jn parentheses, add the names of parameters that the function will act on. 
Separate parameters with commas. 
If your function doesn't have parameters, it still needs the parentheses. 


c. Type acolon at the end of the statement. 


Your def statement might look something like this: 
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Cir my FuMeici]Mm (aiegil, sue) g 


(Optional, but very important) On the next lines, write the docstring for the 
function. 


Follow these steps: 


a. Onanew line, indent four spaces and type three quotation marks (a triple 
quote). 
b. Type an explanation of what the function does and how to use it. 


See the sidebar "What's up, Doc?" to find out what to include. 
c. On the next line, indent four spaces and type a triple quote. 


Your function and its docstring might look something like this: 


Clie imhy uINeIeLeMm (Gace, eee) g 


Woy 


Perform my function on argl using arg2. Return a list. 


Woy 


Write the function's code. 


Indent each line of code four spaces. (If your function contains another 
subordinate or nested code block, such as an if statement, that block's code must 
be indented an additional four spaces, and so on.) 


When you're finished, the function might look like this (but containing Python 
code, of course!): 


Clie my Tune este (Eleeil, ewee~2)) 3 


Woy 


my function does this 


Woy 


do something 

abie {elas} 8 

GO ielseve 

return something else 


Giving another name to a function 


After you import a function, you can give it a new name. The additional name refers to 
the same object at the same location as the old name, as this example shows: 


SSS ELINA TINGE som 
<function tinyfunction at 0x61430> 
>>> myfunc = tinyfunction 
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>>> myfunc 
<function tinyfunction at 0x61430> 


Tip It's important to remember the difference between giving a new name to a function 
and calling a function while giving a name to the result. When you call a function, 
you add parentheses at the end of the function name. These examples show the 
difference: 


>>> myfunction = tinymodule.tinyfunction # giving a new name to a 
LINC CLOW 
>>> myresult = tinymodule.tinyfunction(2) # calling a function 


What's up, Doc? 


If the first code in a function is a string, it becomes the docstring for the function. The 
docstring defines what the function does and works with Python's help () utility. We 
recommend writing doc-strings for all your functions. 


A docstring usually spans multiple lines, so it begins and ends with triple quotes. 


By convention among Python programmers, a basic docstring for a function looks like 


this: 


1. 


The first line (the only line for a simple function) is a description of what the 
function does, or a summary if the function does several things or has several 
options. 


For example, take this function that prints something: 


def printme (me): 
print me 


This function does only one thing, so the docstring might look like this: 


"""Print the argument""" 


If the function needs more than one line of documentation, the second line should 
be blank. 

The third and subsequent lines explain options, defaults for optional arguments, 
and other usage notes. 


Say we change the printme() function so it takes an optional argument—a list of 
words not to print: 


def printme(me, bad word _list=None) : 
if bad_word_ list and me in bad_word list: 
print "How dare you!" 
else: 
print me 
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We might give it the following docstring: 


Wo 


Print argument. Optionally check against list of words not to print. 


Keyword argument: 
bad_word list -- list of words not to print (default None) 


Won 


You can find out more about Python's docstring conventions by reading Python 
Enhancement Proposal (PEP) 257 at: 


http://www.python.org/dev/peps/pep-0257 


Returning values from a function 


Names you define inside a function disappear after the function exits. To get values out of 
a function, you must do two things: 


e Include a return statement in the function with a name or value, like this: 
e def my function (b): 

e a = lo + 42 

e return a 

e Use the function in an expression or in an assignment statement: 

e my value = my function (3) 


The name on the left side of the equals sign in the assignment statement now 
stores the value that the function returned, as shown: 


SS> iy value = ihy cuMecsLom (3) 
>>> print my value 
45 


If a function has no return statement, or if the return statement doesn't specify what to 
return, the function returns None (a built-in name). Here's a function without a return 
statement. 


PSS CS moO _Kerlem TuMeiciem (9) ¢ 
pean Is 


SS> iy value = me eeinuiein wuiMce]Mm (3) 
3 

>>> print my value 

None 


So why would you ever want a function without a return statement? Some functions 
manipulate an item in place instead of returning a new item. Some functions perform an 


NFS 


action that doesn't need to return a result (for example, saving a message to a log). Such a 
function might not need a return statement. 


Tip The return statement not only passes a value out of the function, it also stops 
Python from running the rest of the code in the function. So return either should be 


the last statement in the function or should be used in conditional code, such as an if 
block, to jump out of the function when a certain condition is met. 


Argument Clinic: Passing Data 


Parameters are placeholders for information that you give to a function so it can carry out 
an action. 


There are many ways to specify a function's parameters. This section introduces 
specifying positional parameters, default-value parameters, and arbitrary numbers of 
parameters. It also describes how to avoid the quirks of default-value parameters and 
mutable objects. 

Introducing parameters and arguments 


Here are a few things to know about function parameters: 


e A function can have any number of parameters (or no parameters). 
e When you define a function, you specify how many parameters it has. 


It can have a specific number of parameters or an indefinite number of 
parameters. 


e You can define default values for any or all of the parameters. 


When you call a function, you need to give it a value corresponding to each parameter. 
(This actual value is called an argument.) Here are a few things to know about arguments: 


e Arguments are passed when you call the function. 
e Ifthe parameter has a default value, you don't have to pass an argument. 


e You can pass an argument as a literal or as a name. 


If you pass an argument as a name, the function acts on the object the name refers 
to. 


Specifying arguments when you call a function 


The basic way of specifying arguments is to pass them when you call the function. 
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To pass an argument, type it between the parentheses of a function when you call the 
function, like this: 


>>> a_function (myarg) 


The following example creates a function that prints the value of its argument, gives a 
name to a tuple, and passes the name to the function: 


PSS Clo 2_iwMe sem (3) ¢ 
print "you passed me the argument", repr (x) 


SS> myeiee; = (iL, 2, 3)) 
>>> a_function (myarg) 
you passed me the argument (1, 2, 3) 


The following example passes the same function a string literal without defining a name: 


PSS g_ibinewiem (Maelo) 
you passed me the argument ‘'hello' 


Specifying arguments with keywords and default values 
A function can have default values for some of the parameters. (Such default-value 
parameters are sometimes called keyword arguments.) Here are some circumstances in 
which default-value parameters are useful: 
e A particular value will be used more often than others. 
For example, Python's round() function takes a required number argument and 
an optional precision argument. The precision parameter defaults to 0 because, 
most of the time, when you round a number, you want the integer closest to the 


number rather than a decimal. 


e The function is designed to work in more than one way, depending on which 
arguments are passed. 


For example, the open () function opens a file for reading by default. If you want 
to open a file for writing, you need to specify a mode argument, 'w'. 


When adding default-value parameters to function definitions, follow these rules: 
1. Type any parameters that won't have keywords first. 
Parameters without keywords are called positional. 


2. Type keyword arguments in the form keyword=value. 
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The following function definition has one required (positional) parameter (ingredient) 
and two default-value parameters: 


def recipes(ingredient, servings=4, mode="Vegetarian"): 
When calling a function that includes default-value parameters, follow these rules: 
1. Pass values for all positional parameters first. 
Use the same order in which the function's parameters are defined. 


2. Pass values or names for any default-value parameters whose default values 
you want to override. 


Here are some additional guidelines for default-value parameters: 


e Don't pass information for default-value parameters whose default values you 
want to use. For example, this function call uses both defaults: 

e >>> recipes('shiitake mushrooms') 

e Ifyou include default-value parameters in the order in which they were defined, 
you can just pass their values, like this: 

e = 6>>> recipes('arugula', 8, "Carnivore") 

e Ifyou are including default-value parameters in a different order or skipping some 
parameters, use the keyword=value format, as shown here: 

e 6 >>> recipes('arugula', mode="No peanuts") 


Avoiding the quirks of default values 
Default values are useful as function parameters, but you need to be aware of their quirks. 
Immutable default values stay the same 


Python evaluates default values in function definitions only once—when the function is 
defined, that is, when the def statement first runs. (Another way to say this is that Python 
binds the default to the parameter.) def is an executable statement; this is why you have 
to define a function before you can call it. After that, if you change the value of a name 
you used as a default in the function definition, the function will continue to use the 
original value when the value is an immutable data type. In the example below, the def is 
passed the value of the integer mynum. When mynum is changed later, the function still 
uses the old value: 


>>> mynum = 8 # Integers are immutable. 
>>> def f(arg=mynum) : 
ISU UEC} 
SSS i () 
8 
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>>> mynum = 10 We'r 
SSS i () so cle Velie icine Sica Wises 3. 
8 


Mutable default values can change 


When you use a mutable object (such as a list, dictionary, or class instance) as a default 
and change the object inside the function, then the next time you call the function, the 
function uses the changed object. For example, the following function creates a list that 
accumulates the arguments passed to it. 


>>> def f(q, mylist=[]): 
mylist.append(q) 


return mylist 
SSS joreailiac i (Ya) 
[vary 
SSS joreaiaie i (Yl!) 
[ante Vig! | 
SSS joreiliac (oe!) 
feast Uo! Tew] 


Here's what's going on: 


1. When the function is defined, it creates an empty list object named mylist. 
2. Inside the function, the append() method adds an item to the list. 


The append() method changes the contents of the list in place— that 1s, mylist 
still refers to the same object after being changed. 


3. When the function is called again, it grabs the same list from the same place and 
uses the new contents. 


Assignment (=) always changes 


Assignment always causes a name to refer to a different object (that is, it rebinds), even if 
the object is mutable. So if you use assignment (=) inside the function rather than the 
append() method, mylist doesn't accumulate arguments. 


>>> def f(q, mylist=[]): 
MivliSie = iWWlaLSie <P EF 
return mylist 

Pee ee = i 
SS ie (el) 
[1] 

eee! el 2 
>>> f(a) 
[2] 
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Augmented assignment (+=) doesn't rebind mutable data types 


Augmented assignment operators (+= and *=) rebind when used with an immutable data 
type, but don't rebind when used with a mutable data type, such as a list. When used with 
a list, += is the equivalent of the extend() method; it adds items to the end of a list. So if 
we rewrite the function to use augmented assignment, the function extends mylist 
(which is mutable) by adding the argument to the end of the list: 


>>> def f(q, mylist=[]): 
mylist += q 
return mylist 

>>> a = [1] 

SS> 1c (fei) 

(el 

SSS aS 

>>> £ (a) 

[ky 2 


Working on a copy of a mutable object 


Because using mutable objects as arguments can have unexpected results, you're better 
off operating on a copy of the object. This function creates a new copy of its list and then 
adds a value to each list element: 


def f(mylist, data): 


newlist = mylist[:] # make a copy of mylist 
for i in range(len(mylist)): 
newlist[i] = mylist[i] + data 


return newlist 


Calling the above function doesn't change the original list: 


SSS ellaisic = fil, 2, 3] 
SS <x = (elise, 5) 
SSS joie slMe 3x 

[6, 7, 8 

SS> joreiliaic slliasic 

[ly 27 3 


Using a default value of None to redefine a mutable object 
If you want a function to redefine a list as empty each time you call it, use a default 
parameter of mylist=None. None is immutable, so the function won't be referring to a list 


object that might change. 


The following code gives the parameter mylist a default value of None in its def 
statement and then redefines mylist as an empty list inside the function: 
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def f(a, mylist=None): 
if mylist is None: 
myvlise = |] 
mylist.append (a) 
return mylist 


When you call the above function several times, it doesn't accumulate arguments: 


>>> a=l 
>>> f(a) 
[1] 

>>> a= 2 
>>> f(a) 
[2] 


Specifying a function with an arbitrary number of arguments 


To pass an arbitrary number of arguments to a function, use the parameter *args in the 
def statement for the function. Here are tips and rules for using *args: 


e The parameter *args stands for a set of positional arguments that aren't explicitly 
named in the function definition. 

e When you call the function and pass it arguments, *args automatically creates a 
tuple out of the arguments. 

e The parameter «args must come last in the function definition (or next-to-last if 
there is also a **kwargs parameter). 


To pass an arbitrary number of keyword=value pairs to a function, use the parameter 
**kwargs in the function's def statement. Here's how it works: 


e When you call the function and pass the keyword=value pairs, **kwargs 
automatically creates a dictionary from them. 
e The parameter **kwargs must come last in the function definition. 


TECHNICAL The names don't have to be *args or **kwargs; Python cares only 

STUFF that the * or ** operator comes first in the name. But Pythonistas 
always use *args and **kwargs. This consistency makes programs 
easier to read by humans. 


This example program shows how *args and **kwargs work in a function that also has 
two positional parameters: 


CISic ai_icwiaeicaleim (el, Io, “euecis, ~~ lsweuaejs)) 2 


jorcakone Wer als, 2 

joan “Moy als, Je 

jorealione “elites; alfs| class} aeiolkes', chetis 

jorealions “Ws "slenrelieis! als} jelauks| Cluicicakomeicye", laieeis 


It produces the following result: 
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SSeS @_iumecilom(l, "2", MeEheee’, “iollaimel’, "mes", see="how", iclney="ieuin'”) 
ey aly ae 


I) LS 2 
"EGS iS Claas irujoles (Velucse!, Voilstinel’, Vinee )} 
A kWeegs) sass this uct lonaryq|.SCeuqd NOW, theyle: Wren 


Unpacking arguments 

To pass the elements of a list, tuple, or dictionary as arguments in a function, use the * 
operator (for lists and tuples) or the ** operator (for dictionaries) when calling the 
function to unpack the elements. 

To unpack, you need these things: 


e A function 


The function can have any kind of parameters—positional or default, *args, or 
**kwargs... or even no parameters at all. 


e A list, tuple, or dictionary with the same number of elements as the number 
of parameters in the function 


Unpacking lists or tuples 


To pass the elements of a list or tuple as arguments in a function, use this line of code 
(substituting your function name and list or tuple name): 


myfunction (*my_list_or_tuple) 


The following code defines and calls a function, passing a list as its argument, using the * 
operator: 


>>> def func with three args(a, b, c): 
5 jOieiine Vicloks eheewineincs: cucee, ely lo; 


>>> weapons = ['fear', 'surprise', 'ruthless efficiency'] 
>>> func with three args (*weapons) 
the arguments are: fear surprise ruthless efficiency 


Unpacking dictionaries 


To pass the key:value pairs of a dictionary to a function, use this line of code (substituting 
your function name and dict name): 


myfunction (**mydict) 
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The dictionary keys must have the same names as the function's parameters. The 
following code passes a dictionary to func_with three args () using the ** operator. 
The dictionary's keys match the argument names: 


SoS MGir im Sirock = (Ya s"Recl Léleescee”, Moe" Jawilsioere, 
"o":"Camembert™ } 
SSS US Walield ielaiwSS euses (mee iin SieeyC!l<) 

the arguments are: Red Leicester Jarlsberg Camembert 


What's in a Namespace 


A namespace, also called a symbol table or scope, is storage for the names of objects 
Python knows about. 


REMEMBER The most important thing to remember about function namespaces is this: 
When you give a name to a value, you are always assigning within the 
function's local namespace unless 

e You explicitly say that the name is global 

e You specify that the name is an attribute of a particular object 
Discovering where Python looks for names 
Python has three basic layers of namespaces, listed here in order of most specific to most 


general. Python looks for names in this order and stops looking as soon as it finds the 
name: 


— 


Local (names defined inside a class, function, or method) 

2. Global (names defined inside a module—often function names and class names, 
but can be other names, too) 

3. Built-in (names that are always available) 


If Python can't find a name in any of those places, it raises a NameError exception. If you 
get an AttributeError, that means that Python found the leftmost (first) name but didn't 
find the name after the dot. Each object has its own namespace that Python searches. For 

more information about object namespaces, see Chapters 12 and 13. 


Understanding function namespaces 
When Python encounters a function definition in a chunk of code, it executes the 
definition. That prepares the function for being called later. When Python executes the 


definition, the following things happen: 


e The function's name is stored in the current namespace. 
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If the function is part of a module, this is the module's namespace. If the function 
was imported directly, it might be the main namespace in interactive mode. 


e Python creates a new namespace for storing any local names defined within the 
function. 


REMEMBER Python searches from local to global for names, but it doesn't search from 
global to local. That means global namespaces don't know about the names 
inside local namespaces. 


This behavior of Python's is useful for hiding complexity. It also prevents bugs that might 
occur if all names were automatically global—if that were true, it would be easy to use a 
name twice without realizing it, and unexpected things might happen. When names are 
automatically local, they are less likely to conflict. 


How local and global names work with assignment 


Names are automatically local (unless you tell Python to treat them as global). This 
means that if you give a name to a value (using an equals sign) inside a function, when 
you exit the function, Python will forget about the name. Here's a program that 
demonstrates this feature of namespaces using Groucho's famous saying. You could type 
it into a text file and save it with the name groucho.py: 


# groucho.py 
a lxoolke = “nam Ss lo@Sic iieiemel” 
facie Younesicle OF a Clog, a Is@Olkk 18", @_Isxo@lk 


clSic a_ Clog (()) 3 
a lOO = "E@O Clautkk ie@ ieee!” 
jee VMS Or a cog, ie's”, a loool 


a_dog() 
print "we're back outside of the dog again" 
joxaline Vaermel € INOOl als eigen”, e€) IoCls 


Figure 11-1 illustrates global and local names. 
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Global name assignment 


a_book="man’s best friend” 


Local name assignment 


a_doq 


a_book="too dark to read” 


Figure 11-1: A global and a local name assignment. 


If we run the program from the command line, it has this output: 


° 


% python groucho.py 

outside of a dog, a book is man's best friend 
inside of a dog, it's too dark to read 

we're back outside of the dog again 

and a book is again man's best friend 


How local and global names work with references 


Because Python searches for names first locally and then globally, you can reference a 
global name inside the function—assuming the function doesn't contain a local name 
that's the same as the global name. This example program (we call it brightdog.py) and 
Figure 11-2 show that a function can access the name a_book defined outside the 


function: 


# brightdog.py 
a_lsOolk, = Mma’ lxSSie irieaeincl!” 
Scie YOUESICS Ci a Clog, a2 looGls 18", e_lNOGIs 
Clie i _loieaejlae Cheyey (()) § 
jmime “Valmsalele Oi WEIS; clos, € SOO is Sicill”, a_lsoolk 
a_bright_dog() 
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Global name assignment 


a_book="man’s best friend” 


Global name referenced locally 


a_dog 


print “inside of THIS dog 
a book ts still”, a_ book 


Figure 11-2: A global name referenced locally. 


This code, run on the command line, produces the following result: 


fo) 


% python brightdog.py 
outside of a dog, a book is man's best friend 
inside of THIS dog, a book is still man's best friend 


TECHNICAL You can also define a function inside another function. This creates a 

STUFF new local namespace for the nested function. This namespace is 
called a nested scope. Nested scopes are mostly useful for functional 
programming. (Chapter 16 has some information on functional 
programming.) Nested scopes work differently in Python 2.1 and 
later than they work in earlier versions. We don't cover them in this 
book. 


Think globally, act locally 


It's easy to explicitly tell a function that a name is global: You just use the keyword 
global with the name. In the example program below, which we call colors.py, 
defining the name eggcolor as global allows the function to change the value of that 
name, and the changed value remains in effect after Python exits the function. In contrast, 
the name meat, which isn't declared global, is changed only inside the function. 


# colors.py 
eggcolor = "green" 
meat = "ham" 
print eggcolor, “eggs and", meat 
def breakfast (): 
global eggcolor 


eggcolor = "red" 

meat = "bacon" 

print eggcolor, “eggs and", meat 
breakfast () 
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print eggcolor, “eggs and", meat 


If you run this code on the command line, you get the following result: 


fo) 


% python colors.py 
green eggs and ham 
red eggs and bacon 
red eggs and ham 


REMEMBER It's usually a Bad Idea to use global names because they are more likely to 
cause name conflicts in complex programs. 


Sorting out module namespaces 


Each module (a file with Python code, ending in . py) has its own namespace. All the 
functions defined in the module use the module's namespace as their global namespace. 
The functions won't look elsewhere (except for the built-in namespace) for names that are 
referenced within them. This means that the author of a module can use global names in a 
module without worrying about their conflicting with global names used elsewhere. In 
other words, whenever you see "global" used in Python, think "module global." 


Tip For more about modules, see Chapter 12. 


When a module imports another module, the imported module's name becomes part of 
the importing module's namespace. You can then access its functions and other 
information by using the module name, a dot, and the function name, the same way you 
can when you import a module into interactive mode. 


Imagine you have a module called mymod.py that does nothing but import another 
module: 


# module 'mymod' 
import math 


If you import this module into interactive mode, you can print the math.pi constant by 
typing this code: 


>>> import mymod 
>>> mymod.math.pi 
3, LAS 9265359) 


But the name math and its constant pi are known only inside the module. The 
surrounding namespace of interactive mode still doesn't know about the math module: 


>>> print math.pi 
Traceback (most recent call last): 
Pale “W<sieciims”, lime i, am <mochile> 
NameError: name 'math' is not defined 
TECHNICAL It's possible to manipulate a module's global names by using the 
STUFF same notation used to refer to its functions, modulename. itemname. 
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You should do this only if you know what you're doing. (It's an 
advanced feature we don't cover in this book.) 


Chapter 12: Building Applications with 
Modules and Packages 


Python provides ways to organize code into files and directories: 


¢ Modules are the files that contain the building blocks and glue of your program, 
including functions, import statements, classes, and so on. 

e Packages organize modules in directory hierarchies that help users understand 
how everything works together. 


This chapter shows you how you can use modules and packages to organize your projects 
and take advantage of all the tools available for Python. 


Modular Living: Storing Your Code in Files 


Python strongly encourages you to organize function definitions and other statements in 
separate files. These files are called modules. A module is a text file containing Python 
definitions and statements. The filename is the module name with the suffix . py 
appended. 


TECHNICAL Some Python modules are written in languages other than Python, 

STUFF most commonly C or C++. Such a module is called an extension 
module. 

In the following sections, we give you the lowdown on creating your own modules, 


getting access to the modules that come with Python, and using modules in your 
programs. 


TECHNICAL Better coding through modularity 
STUFF 


Modules have several benefits that let you code more efficiently and 
spend less time stomping bugs: 


e Modules save time. 
You can write code once and use it in many programs. 


e¢ Modules hide complexity. 
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Each module creates its own namespace, so its names won't 
conflict with names defined elsewhere. 


e Modules make debugging easier. 
Storing chunks of code in modules makes it possible to write 


shorter programs. Python's debugging tools specify which 
module an error is found in. 


Importing a module or its contents 
To access a module's code, you import the module into either 
e Interactive mode 


Tip Importing a module in interactive mode is covered in Chapter 2. 
e Another Python program 


Both these ways of importing are similar. The only difference is where you type 
the import statement. 


The following sections cover how to import a module, what Python does when you 
import a module, and ways of accessing the goodies in a module. 


Importing a module 


To import a module in interactive mode, or in another Python program, type import 
followed by the name of the module without the . py suffix, like so: 


import modulename 
Initializing a module 


Importing a module for the first time in a particular program, or in interactive mode, 
causes Python to perform a series of actions called initializing the module. Here's how 
initializing works: 


1. Python creates a module namespace that stores the names defined in the module. 
2. Python runs the code in the module. 
3. Python stores the name of the module in the local namespace. 


TECHNICAL Initializing happens only the first time the module is imported. If you 


STUFF import the module again, the module code doesn't run, but the import 
updates the current namespace. 
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Accessing functions and other items inside a module 


When you import a module, all the functions and other information in the module 
(collectively called the module's attributes) are available to you. 


To access a module attribute, type the module name, a dot, and the attribute name. If you 
are calling one of the module's functions (see Chapter 11) or creating a class instance (see 
Chapter 13), also pass any arguments that are required. For example: 


mymodule.myfunction (x) 


TECHNICAL Other ways of accessing module attributes 

STUFF 
You can import attributes from a module into the local namespace 
directly by using one of the following lines of code: 


e To importa specific item directly, type from, the module 
name, import, and the item name, as shown: 

e from mymodule import myfunction 

¢ To import all of a module's functions directly, type from, the 
module name, import, and an asterisk (*), a wildcard 
character that stands for "all." For example: 

e from mymodule import * 


Warning! Most of the time it's a Bad Idea to use the import * syntax 
because 


e You can't control what you're importing. 

e You risk polluting the local namespace with all the names 
from the imported modules, which makes debugging much 
more difficult. 


When you access module attributes through the module name, you're 
less likely to run into conflicts between the names inside one module 
and the names inside another module. And your code will be easier to 
read. 


For example, you might have two modules that each have a function 
to combine two values in some way. If each module's function were 
named combine (), then if you tried to access each of these functions 
directly, you might accidentally use the wrong one, or readers of your 
code might not know which function you were using at any particular 
time. But if you always use modulename.combine (), then which 
function you're using is clear. 


There's one situation, however, when it's appropriate to import an 
item from a module directly. That's when a module is a wrapper 
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around a single class or function. For example, the module 
cStringIo contains a single function called StringIo(). (See 
Chapter 19 for details.) The simplest way to use StringI0O() is to 
import it with this line of code: 


from cStringIO import StringIO 


Tip If you don't remember the names of the functions inside a module, use the dir () 
command in interactive mode to list them, like this: 


>>> dir (mymodule) 
Giving a local name to a module 


In this section, you find out two ways to give a local name to a module or a module's 
functions: You can give an extra name to a module or function, or you can give a module 
a different name when you import it. 


This section also explains the difference between giving an extra name to a function and 
calling the function while giving a name to its result. 


Giving an extra name to a module or function 


To give an extra name to a function or module (so that both names refer to the same 
thing), first make sure the object has been imported; then type the new name, =, and the 
name of the function or module, like this: 


myfunction = tinymodule.tinyfunction 
Importing a module using a different name 


To import a module (but not a function) using a different name, type import, the module 
name, as, and the name you want to use. For example: 


import tinymodule as mymod 
Tip Although it's not common, you can also combine the from import and import as 
syntax, as follows: 


from tinymodule import myfunction as a_function 
REMEMBER If you import a module using a different name, the local namespace won't 
know the real module name; it will know the module only by the name 
you used to import it. 
REMEMBER The syntax for giving a local name to a function is different from the 
syntax for calling a function and giving a name to the result. When you 
call a function, you add parentheses at the end of the function name: 
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myresult = tinymodule.tinyfunction (2) 
Rules for writing and naming modules 


The same rules apply to naming modules that apply to naming other Python objects: 
Hyphens and spaces aren't allowed, and you can't use a Python keyword as the name. If 
you do any of these things, you'll get errors. 


The Python community uses the following conventions for writing modules. The 
conventions make Python code more standard and easier to read: 


e Modules have short, lowercase names. 
¢ Modules that exist mostly to export a single class are named after the class. 


In this case, the module and class have the same name, but the module name is 
lowercase and the class name 1s capitalized. 


e All import statements come at the beginning of a module. 
e The name of a non-public module (or function, class, method, name, or 
attribute) should begin with a single leading underscore. 


For example: 


_myNonPublicModule 


A non-public name is not imported when you type from modulename import *, 
although you can still import it explicitly. See Chapter 13 for more about non- 
public names. 


Warning Avoid giving your module file the same name as a standard Python module. 
Otherwise, the wrong module might be imported. 


Module, module, where is the module? 
Python keeps track of module files and other files it needs in several ways. 


On most operating systems, the PYTHONPATH environment variable lists the paths 
(directories or folders) where Python's modules are stored. PYTHONPATH uses the same 
format as the shell variable patu. The format depends on the operating system you are 
using. For example, on UNIX or Mac OS, Python might be installed in 
/usr/lib/python/ or /usr/local/lib/python/. On Windows it might be installed in 
C:\Program Files\Python25 


When you import a module, Python searches possible module locations in the following 
sequence until Python finds a file with that name and the suffix . py: 
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— 


The current directory 

The directories listed in PYTHONPATH 

3. The directories in the PATH environment variable (which depends on your 
operating system and account setup) 


Le 


All the search paths that Python knows are stored in the list sys.path, which is part of 
the sys module. This list stores these path names: 


e The directory containing the input script (or the current directory) 
e The contents of the PYTHONPATH variable 
e The default search path 


To modify sys.path, you use list methods. You need to import the sys module first. The 
following example appends a path to sys.path: 


>>> import sys 
>>> sys.path.append('/ufs/guido/lib/python') 


Warning Be careful not to remove important directories (such as the directory in which 
Python is installed, or any subordinate directories) from the search path, or 
Python may lose track of its modules. 


Finding what's in standard modules 


Python comes with a library of standard modules. Many of these modules are covered in 
Part IV and elsewhere in this book. 


Tip If you want to find out the gory details about any of the standard modules that we 
don't cover, see the Python Library Reference at http://www.python.org (it's usually 
also included when you install Python; instructions for installing Python 
documentation are in Appendix A). 


The built-in function dir() returns a sorted list of the names a module defines. To use it, 
type dir followed by the module name in parentheses, like this: 


>>> dir (modulename) 


To get help on a named item in a module, type he1p; then type (in parentheses) the 
module name, a dot, and the item name. For example: 


>>> help (modulename.functionname) 
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Wrapping It Up in a Package 


A package is a collection of related modules stored in a directory and subdirectories. This 
section explains why packages are useful, the requirements for setting up packages, and 
the ways to import packages. 


A__name_isa_name_,exceptwhenit's main _ 
Within a module, the module's name (as a string) is stored in the global name _name_. 


But there's an exception to this: If the module file is run as a program from the command 
line, Python sets the module's name _to'  main_'. 


This feature is useful for including code (usually test code) in a module file that runs only 
when you start the module from the command line (by typing python modulename .py). 
The code looks like this: 


if name == ' main ': 
do something 


When you include code that runs only under these circumstances, you don't have to 
remove the code when you import the module. That's because when a module is 
imported, name __ is the module's name, not '  main__'. 


This code goes outside any function definitions or import statements in the module, 
usually at the very end of the module. 


Here's an example of a small function and some test code for it. 


def f(x): 
VY = X**X 
return y 


ae name == "| main ": 

t "testing..." 

print "passing the value 2" 
Z = £(2) 
print "the function returns", z 


The purposes of packages 
Packages are another way of taking advantage of Python's ability to hide complexity. 


Tip Packages have some useful features: 
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e A package has its own namespace. 


Names defined within a package—including module names—won't conflict 
with names defined elsewhere. 


e Packages can contain modules. 
e Packages can contain subpackages. 


Packages are ideal for storing and organizing a lot of related features. 


Requirements for packages 


These are the requirements and guidelines for packages: 


A package lives inside a directory, and its subpackages live inside 
subdirectories of that directory (generally speaking). 
Store the package's main directory in either 
o One of the two special package directories on your computer 
o The directory of your application 


See "About special package directories," later in this section. 


Just as a module file names the module, a package directory names the 
package. 


That allows you to import items in the package by using the import 
packagename.modulename format. 


A package directory must contain a file called init .py. 

In the simplest case, init .py is just an empty file. But it often contains code 
that initializes the package. For example, the best way to import specific 
submodules with a particular package or subpackage is to load them in the 
JEN. apy file. 

The init__.py file sometimes contains the _all___ list (see next item). 
Include an index of your package's contents in a list named all _ 

This list helps Python code run on different operating systems. When you include 
module names in __all__, the command from packagename import * imports 


only those modules. 


TECHNICAL If no one will ever import * from your package, you don't 
STUFF need the list. 
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About special package directories 


When Python is installed, it sets up specific directories for packages. If you store 
packages in one of these special package directories, you're less likely to run into trouble 
if your package names are the same as module names elsewhere on the Python path. 


There are two special directories for packages: 


e lib/site-python/ 

e This directory is for packages that work the same way with all Python versions. 

e §lib/pythonxx/site-packages/ 

e This directory is for packages that depend on the Python version. XX changes to 
represent the Python version number. 


Example package structure 


Here's an example structure for a package that handles graphics. More package help is 
available at http://www.python.org/doc/essays/packages.html. 


graphics/ Top-level package 
alice 6 DS Initialize the graphics package 
formats/ Subpackage for file format conversions 
__dimalie___o fOiy 
gifread.py 


gifwrite.py 


effects/ Subpackage for graphics effects 
__ aia SY 
speckle.py 
swirl.py 


invert .py 


filters/ Subpackage for filters 
_ Smale JOY 
noise.py 
mask.py 


Importing items from packages 
You can import items from packages in several ways. 
TECHNICAL Importing with packages works in much the same way as importing 


STUFF with modules, except that you sometimes have to include more levels 
of names. 
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Individual modules and packages 


You can import individual modules and packages in a couple of ways, depending on how 
they will be used in your program. 


Importing by full name 


If you'll be using many parts of a complex package, use the following format to import an 
individual item by its full name: 


import graphics.effects.invert 
This type of package import statement has a couple of rules: 


e Each name except for the last must be a package. 
e The last name must be a module or a package; it can't be a class, function, or 
named object (attribute). 


TECHNICAL To call a function from a module imported by its full name, type the 
STUFF full name of the module and then the function name, like this: 


graphics.filters.noise.edges (args) 
Import by using from 


If you'll be using only a few parts of a package, use the following format to import an 
individual item by its full name: 


from graphics.filters.noise import edges 
Tip This format works for importing any of the following objects defined in the package: 
submodule, subpackage, function, class, and named object. 


Here are a couple of examples of importing an individual item: 


e The following example imports the noise module: 
e from graphics.filters import noise 


If you import noise this way, you use this code to call its function: 
noise.edges (args) 


e The following example imports the edges () function: 
e from graphics.filters.noise import edges 


If you import edges () this way, you use this code to call it: 


edges (args) 
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Import all items from a package 


If you're sure that it won't cause namespace conflicts (for example, if you're using one 
small package in your program and not importing other modules or packages), use the 
following format to import all the items from a package: 


from graphics import * 
Using import * imports the items inthe all_list. 


TECHNICAL Ifno _ali___list is defined, the import * statement imports these 
STUFF names from the package: 


e The package name 
e Names found in import statements in the file 
e Names of modules and subpackages (but not submodules) 


e Names definedin init .py files 


The import * statement does not import any names with a single underscore prefix (for 
example, mymodule). 


When packages contain subpackages, there's no shortcut to refer to submodules of sibling 
packages (packages defined at the same level). You need to use the full name of the 
subpackage (packagename. subpackagename). Python 2.5 has a shortcut for importing 
from sibling modules. The shortcut is part of the absolute importing feature, which is 
covered in the following section. 


Absolute imports in Python 2.5 


In Python 2.4 and earlier, type the whole package and subpackage hierarchy when you 
import from a package (for example, from graphics.filters import noise). Ifyou 
don't, someone reading your import statement has no way of knowing whether it imports 
another module from the package or a top-level package that's part of the Python 
distribution. 


TECHNICAL Although it's possible to use relative imports in Python 2.4 and 
STUFF earlier, the rules are too complicated to document in this book. 


Absolute importing, new in Python 2.5, addresses this problem. There's no ambiguity 
about which module is being imported because Python searches for imported modules or 
packages on sys.path only. To import sibling modules in a package, use relative imports 
(see the following section). To access absolute importing, type this code into your module 
or into interactive mode: 


ECO UNEASE EIOSO UES: siNHorwie 
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Relative imports in Python 2.5 


When you turn on absolute importing in Python 2.5 (by using from _ future import 
absolute import), new syntax is available for importing other modules in the package 
directory your module is in. Follow these rules: 


e Use the from..import syntax. 
e Type one or more dots in front of the name of the package or module you're 
importing from: 
o The first dot stands for the current module's package. 
o The second dot stands for the package containing the current module's 
package (the parent package). 
o The third dot stands for the parent of the parent, and so on. 


Here are some relative import statements. If you write the swirl .py module in the 
package hierarchy (described in "Example package structure," in this chapter), you 


import other modules in the package as follows: 


e This statement imports swirl.py 's sibling module, speckle.py: 


e from . import speckle 
e This statement imports the gifread module: 
e from .formats import gifread 


e This statement imports the edges () function from the noise module in the 
Filters package: 
e from .filters.noise import edges 


Chapter 13: Getting Classy 


Overview 


This chapter is about classes, a tool for creating new types of objects in Python. When 
you create a class, you put data (also called state) and behavior in one suitcase. Programs 
based on these suitcases of data-plus-behavior are easily customized. 


Python also uses classes to organize many of its own features. For example, Python's 
built-in data types, such as strings and lists, are classes. The use of classes hides 
complexity from users who don't care about it. You can use a list without worrying about 
how lists work, and you can perform an action such as sorting a list without worrying 
about how sorting works. (And if you ever had to write a sort function, you know that's a 
good thing.) 


In this chapter, you find out how to create and use these Python objects: 
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e Classes, subclasses, and superclasses 
e Class methods 

e Class instances 

e Class and instance attributes 


You also find out why classes are important to Python's modular design. 


Alley-OOP! Some Object-Oriented Programming 
Concepts 


Python is a language that supports object-oriented programming (OOP for short). In OOP 
parlance, objects are bundles of data and actions (methods) that operate on the data. OOP 
programs are built around such bundles. 


TECHNICAL In contrast, languages such as C focus on procedural programming, 

STUFF and the Scheme/Haskell languages focus on functional 
programming. Python is a balanced language with good support for 
both procedural and OOP styles and some support for functional 
programming. 


Objects and functions aren't mutually exclusive—in Python, methods are just functions 
attached to objects. But considering a program in terms of data/action bundles is a 
different approach than creating data objects and then using separate actions on them. 


We don't describe object-oriented programming in detail in this book, but the following 
concepts are important for understanding Python classes. 


Objects 
In object-oriented programming, an object is a thing that bundles data and behavior. 
To use a real-world example, imagine you have a dog (you might not have to imagine). 


e The dog belongs to a breed. 

REMEMBER If the dog were a Python object, the breed would be its type or 
class. As shown in "Using multiple inheritance," later in this 
chapter, you can even create "mutt" types. 

e The dog has characteristics associated with it: its name, age, and so on. 

REMEMBER If the dog were a Python object, these characteristics would be its 
data attributes. 

e The dog can do actions—run, sit, bark, mutilate shoes, and so on. 


REMEMBER If your dog were a Python object, these would be its method 
attributes. All of the attributes "go with" the dog wherever it goes. 
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Likewise in Python, if you have an object—say, a list—you use the name of the specific 
list object to access both 


e Its data (the items in the list) 
e The actions that list objects can do (slice, append, and so on) 


Inheriting, overriding, and extending 


Subclasses inherit behaviors, abilities, and properties from the general (parent) classes 
they are based on. For example, Labrador Retrievers are a subclass of all dogs and inherit 
the properties and abilities of dogs. 


When a subclass changes the way a parent class works or changes its data attributes, the 
subclass is overriding the properties of the parent class. The Lab is gentler than some 
other kinds of dogs, so a Lab class might be said to override the gentleness quality of 
dogs in general. 


When your subclass includes new data or methods, it is extending the properties of the 
parent class. A Lab has the built-in skill of retrieving. A Lab class might be said to extend 
the abilities of dogs to include retrieving. 


Tip One way to think of inheritance is to use the phrase is a. A Lab is a dog; that means a 
Lab inherits properties and abilities of dogs. 


Polymorphism and duck-typing 


In some languages, a single function or method works only with specific types of objects. 
In Python, many tools work with more than one type of object. For example, the in 
keyword (which calls the special method contains __()) acts on lists, tuples, and 
dictionaries—objects of different types. This feature is called polymorphism. 


Mirror, mirror on the wall, what's the object of this all? 


Python terminology gets a little confusing at times, especially with all the different things 
that the word object can stand for. In Python, the word object refers to any item in a 
Python program that has its own storage space—from a simple name to a multi-megabyte 
dictionary (or any of its elements) to a function, module, class, or instance. But in object- 
oriented programming, the word object refers to any item that combines data and 
behavior. These items (along with other items) are also objects in Python. Specifically, 
they are called classes(or class objects) and instances (or instance objects). Any 
objections? 


Python's polymorphism is based on names. For example, file objects have three basic 
methods named read(), write(), and close (). If you create a class that has those three 
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methods, then even though your class isn't officially a file object, you can use instances 
of your class in many of the places that Python expects a file object. Pythonistas often 
call this feature duck-typing because it's like the saying, "If it looks like a duck and 
quacks like a duck, it must be a duck." In programming terms: If this object has the 
attributes of a file and the read(), write (), and close() methods of a file, the program 
accepts it in cases where the data type file is called for. 


Now Class, for Instance ... 


In this section, we explain how classes fit in with other Python objects and, in general 
terms, how classes and class instances work together. These are the most important things 
to remember: 


¢ <Aclass is a general description of something. 
e¢ Aclass instance is a specific example of what the class describes. 
¢ In order to do things with a class, you usually need to create an instance. 


Classes, modules, and functions 
Python classes are similar to Python modules and functions in these ways: 


e Classes, modules, and functions all contain assignments and statements. 

e Both classes and modules usually include function definitions—code blocks that 
begin with a def statement and perform some action. (In classes, these are called 
methods rather than functions.) 

e Classes, modules, and functions all have their own namespaces. 

e You call classes and functions by using the name followed by parentheses. Calling 
a class does one thing: It creates an instance of the class. 


A class is like a template 


A class is a template for data and behavior. You use the template to make specific objects 
called instances. The instances share the attributes of the class. 


Say you have a copy of the American Kennel Club breed standard for a Labrador 
Retriever. The AKC document is like a class because it describes the qualities and 
abilities a Lab has. The data attributes for the Lab breed—its coat quality, size, eye color, 
and so on—are described in the class, and so are the method attributes—its ability to 
retrieve, to swim, and so on. 


A Python class defining a Labrador Retriever might include this code: 


class Lab: 
weight = 60 # data attribute 
def retrieve(self, thing): # method attribute 
return (thing) 
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Note that the AKC breed standard isn't a dog, it is just a description of the dog. Likewise, 
in Python, the class isn't the actual object you manipulate. To work with the class, you 
first tell the class to create an instance. (This is where our example breaks down—you 
can't use the AKC breed standard document to create a dog; you need a couple of other 
dogs for that!) 


Telling the difference between a class object and a function object 


The way you create a class instance looks superficially similar to the way you name the 
result of a function. In fact, if you don't know whether the object is a class or a function, 
you might not be able to tell which is being used by looking at the assignment. (Classes 
frequently do not take arguments, and functions usually do, but this isn't universally true.) 


>>> fido = Lab() # creating a class instance 
>>> y = len('mystring') # calling a function 


To determine what kind of object is associated with a name, use the type () function, like 
this: 


>>> type (fido) 
<type 'instance'> 
>>> type (y) 
<type ‘int'> 
>>> type (Lab) 
< 

> 

< 


type 'classobj'> 
>> type (len) 
type 'builtin function or method'> 


Note that a function can return an instance. 


>>> def foo(): 
return Lab() 


>>> type (foo () ) 
<type 'instance'> 


An instance is a copy made from the template 


An instance is a specific object, with its own name, that has the same attributes as the 
class you use to create it. 


Tip If you get a Lab from the animal shelter and name her Pilot, then your dog is like an 
instance of the Labrador Retriever class. She has her own name, but she shares the 


attributes of Labrador Retrievers in general. 


So if you were to describe Pilot in Python code, you would type this, which creates an 
instance (pilot) of the Lab () class: 
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>>> pilot = Lab/() 
All about class and instance attributes 


Each thing that you name inside a class, whether it's a method, data, or some other kind 
of code, is an attribute of the class. 


REMEMBER When you create an instance of the class, it starts with the same attributes 
as the class. You can add or alter instance attributes after that. 


To see the value of a class or instance data attribute, type the class or instance name 
followed by a dot and the attribute name, like this: 


>>> Lab.weight 
60 

>>> pilot.weight 
60 


Use the same code to see how Python internally identifies a method attribute: 


PES Lalo. eSicewSsweS 

<unbound method Lab.retrieve> 

>>> pilot.retrieve 

<bound method Lab.retrieve of <_main_.Lab instance at 0x62300>> 


Tip Object-oriented programmers say that Lab has a weight and has a retrieve () 
method. See the sidebar, "When Not to Use Inheritance," later in this chapter. 


Making and Calling Classes 


The following sections cover the syntax of writing a class, calling a class to create an 
instance, and using a method via an instance. These, along with assigning and changing 
attributes (discussed in the previous section), are the most common class and instance 
tasks. 


In this section, we use the following example class: 


class NameTag: 
def aLIMaLie (self, myname): 
self.myname = myname 


def say(self): 
print "Hello, my name is", self.myname 
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Creating a class 


To write the first line of a class definition, type class, the name of the class, and a colon, 
like this: 


class NameTag: 

Tip By convention, class names are capitalized and have embedded caps. By contrast, 
function and method names are usually all lowercase. This makes it easier to tell at a 
glance which object is being used at a particular point in a program. 

Writing an __init__() method 

Most classes have an init _() method. This method runs when you make an instance 


of a class. It specifies how many arguments the instance needs and puts the instance into 
a known initial state. 


Rules 
A few key things to know about init _() methods: 
e The init () method is a special method called an initializer. 


In Python, special methods begin and end with two underscore characters. 


e It's conventional forthe init () method to come first in the class 
definition. 
e The init () method, like other methods, needs a self parameter (which 


stands for the instance), plus any other parameters that the class needs to 
create an instance. 


Basic syntax 


To write abasic init _() method that includes only the self parameter, follow these 
steps: 


1. On the line beneath your class definition header, indent four spaces. 
2. Typedef init (self): 


Arguments 
Many classes require you to pass arguments when you make instances from them. 
Creating attributes for these parameters inthe init () method lets other methods 


access them easily. 


To write an init () method that includes parameter attributes, follow these steps: 
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i? 
8. 


Me 
OR 
11. 


On the line beneath your class definition header, indent four spaces and type 
this: 


get  imie (selz 
On the same line, type a comma and the name of a parameter, like this: 
def init (self, myname 


Repeat Step 2 for each parameter you want to include. 
Type an end parenthesis and a colon: ) : 


Your def statement might look like this: 
cet imide (sel, uiyymeime) 3 


Go to the next line and indent four spaces past the def statement. 
Create an attribute for the first parameter by typing self, a dot, the name of 


the attribute, an equals sign, and the name of the parameter, like so: 
ClO ime (sel, mymenne)) ¢ 
self.myname = myname 


Repeat Step 6 for each parameter in the def statement. 


Writing more methods 


The init _() method usually gives the class only some initial information. Most 


classes 


also include methods that perform actions. You create methods by using the def 


statement, exactly the way you write a function. 


To write a method, follow these steps: 


if 


Making sure to indent 4 spaces, type def, the name of the method, and (self, 
like this: 
def say(self 


(Optional) Type any additional parameters the method requires. Separate 
parameters with commas. 


Don't use attributes as parameters—you access them through self. 


End the line by typing ) :. 
On the next line, indent four more spaces and type the code for the function. 


If your code references class or instance attributes, be sure to use the self.name 
syntax. 


This example code includes only the self parameter. 


def say(self): 
print "Hello, my name is", self.myname 


Pure selfishness 
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When you work with Python classes, you encounter the name self a lot. 


The easiest way to understand self is this: self stands for the instance that you create 
from a class. For example, if you create an instance named pilot from the class named 
Lab (), self stands for pilot. 


When you are writing a class and you define a method in that class, you need to explicitly 
include the self parameter in the method definition, as in the example below. Otherwise 
the method wouldn't have any way of knowing which instance to use. 


class NameTag: 
def init (self, myname): 
self.myname = myname 


When you give a name to a value in a method definition, you use self. in front of the 
name. The line self.myname = myname makes the myname parameter automatically 
available to other instance methods. That means you can call those methods without 
passing myname as an argument. Note the following code: 


def say(self): 
print "Hello, my name is", self.myname 


If this were a function, you would have to pass myname as an argument or else Python 
would complain that myname isn't defined. But because it's a method and you've already 
defined self.myname inthe init _ statement, the method can figure out the value of 
myname by using the self object. 


The self parameter also makes life easier when you create an instance. Python 
automatically passes the self argument for you. For example, if you create an instance of 
NameTag(), you only have to give it the value of myname. It automatically uses the name 
of the instance as the value for self. 


Static and class methods of new-style classes work differently. See the Chapter 14 for the 
low-down on new-style classes. 


Creating other class attributes 


You can put any kind of code or assignment into a class. Anything inside a class that you 
give a name to becomes an attribute of the class. 


For example, we could add a favorite color attribute to our NameTag class: 


class NameTag: 
favorite color = "Blue" 
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Then if we create an instance of the class, the instance also has this attribute: 


SSS Cicail Seals = Nem ers (" lamesiloir ’) 
PSS Giaill Seeker. teworiirS Cole 
BLS | 


Creating an instance 


To create an instance of a class (which is also known as calling a class), you give it a 
name, and then type the class name, parentheses, and any parameters. Follow these steps: 


Type a name for the instance, followed by an equals sign, like this: 
s nelle = 


Type the class name and ( as shown: 
. hello = NameTag( 


If your class requires parameters, include values or names for them. Separate 
parameters with commas. 


Oa BW NH Re 


If you call the NameTag class created in the previous section, you need to pass a 
value for one parameter, myname. 


hello = NameTag('Arthur' 
6. Type). 
The completed code looks like this. The instance name is hello. 
hello = NameTag('Arthur') 
Calling a method via an instance 


After you have created an instance (by using code like instance name = 
ClassName () ), you can perform the actions defined by the class's methods. 


REMEMBER To call a method via an instance, type this on one line: 

e The name of the instance 

e Adot 

e The name of the method 

e Apair of parentheses 

Pass any needed arguments inside the parentheses. 

For example, to call the retrieve () method of the pilot instance, type this code: 
>>> pilot.retrieve ("ball") 


Siva! 
REMEMBER You can't use this code to call the method by using the name of the class 
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because the class is only a template for the instance object. An AKC breed 
standard document can't retrieve a ball—only the actual dog can. Here's 
what happens if you try to call the method by using the name of the class: 


>>> Lab.retrieve ("ball") 
Traceback (most recent call last): 
dalle WeeiechiigeY Istioe) i, alin 2 
[TypeError: unbound method retrieve() must be called with Lab 
instance as first 
argument (got str instance instead) 


TECHNICAL There is actually a way to call a method by using the name of a class. 

STUFF This is known as calling an unbound method. It's mostly an advanced 
programming technique, but it's also used to extend methods in 
subclasses (see "Extending superclass methods," later in this 
chapter). Briefly, you create an unbound method by using the name 
of the class instead of the name of the instance (for example, 
Lab.retrieve rather than pilot.retrieve). To call it, you pass it 
an instance name along with any other arguments the method 
requires—for example, Lab. retrieve(pilot, "ball"). 


Where did self go? 


= 


Does something seem to be missing in the code for calling a class? The init () 
method in the class NameTag has two parameters—sel1f and myname. Why do you need to 
pass only one parameter when you create a class instance? 


The answer is that self is a special parameter that essentially passes itself "behind the 
scenes" when you use the class instance. So you don't have to pass self in this statement 
(although there are some cases in which you do). In fact, if you do pass it in this 
statement, you get an error: 


>>> hello = NameTag(self, 'Arthur') 

Traceback (most recent call last): 
File "<stdin>", line 1, in ? 

NameError: name 'self' is not defined 
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Getting Inside the Factory: How Class and Instance 
Namespaces Interact 


A class and an instance each have a namespace, which is storage for the names of objects 
that Python knows about. When you create an instance of a class, the instance has its own 
namespace, but it also shares the class's namespace. 


Whenever Python encounters a name that's part of an instance, it looks for the value first 
in the instance's namespace. If it can't find the value there, it looks in the class's 
namespace. 


Tip Function namespaces and global namespaces interact in similar ways. See Chapter 
11. 


So anything that you do to the class namespace (such as changing an attribute of the 
class) is also reflected in the instance—except when the instance has already overridden 
that value from the class namespace. The next section shows how it works. 


Changing the values of class and instance attributes 


You can change the values of a class's attributes. When you create an instance of your 
class, you can also change (or override) the values of any class attributes it inherits. The 
effect these changes have depends on what data types the attributes are, how you change 
the values, and in what order you change the values. 


When we first define our Lab instance, pilot, it has the same value for weight that the 
Lab Class defines: 


>>> Lab.weight 
60 

>>> pilot.weight 
60 


Changing the value of the class attribute 


To change the value of the class attribute, type the class name, a dot, the attribute name, 
=, and a new value, like this: 


>>> Lab.weight = 50 
Warning Changing a class's attributes in code outside the class (and especially outside 
the class's module) makes your code hard to read and to debug. 
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Changing the value of the instance attribute 


To change the value of the instance attribute, type the instance name, a dot, the attribute 
name, =, and a new value, like this: 


>>> pilot.weight = 45 
Effects of changing attribute values 
The effects of changing attribute values are as follows: 


e Ifyou change the class attribute first, the instance value also changes: 

e >>> Lab.weight = 50 

e >>> pilot.weight 

O50) 

e Ifyou change the instance attribute first, the class value does not change: 

e >>> pilot.weight = 45 

e =6>>> Lab.weight 

O 60) 

Warning If the attribute is a mutable data type, such as a list, and you use a 
method that changes the attribute in place (such as the append() method 
of lists—see Chapter 8), then changing the instance attribute does 
change the class attribute. Be sure to operate only on copies of mutable 
instance attributes or avoid using change-in-place methods on them. 

e Giving a new value to the instance attribute overrides the value it originally gets 
from the class. So if you change the class attribute after you change the instance 
attribute, the instance attribute does not change. 

e >>> pilot.weight 

Ome 435) 

e >>> Lab.weight = 100 

e >>> pilot.weight 

e425 


Adding an attribute to an instance 


In "Making and Calling Classes," earlier in this chapter, we describe how to add an 
attribute to a class, and the preceding section describes how to change the value of a class 
or instance attribute. You can also add a data attribute (and value) to an instance that 
you've created. Just write code like this: 


SSS pillloe cose @olor = “yellow 
REMEMBER This attribute is stored in the instance's namespace, so the class won't 
know about this attribute, as illustrated here: 


PSS leis s@O@ee Colles 
Traceback (most recent call last): 
walle Weetechhias”, sins i, alma 2 
yc ici Wobiceihiewoues Clleiss) leisy Ines} IMG) slivicicallowlee VSOaie Colo’ 
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Class and Instance Conventions 


Python programmers use some important conventions when writing and implementing 
classes. Python will understand your code if you don't use them, but humans who have to 
read your code and figure out what it does will appreciate your using them. Some 
conventions are as follows: 


¢ Modules that exist mostly to export a single class should be named after the class. 

e Class names begin with an uppercase letter. Attribute and method names begin 
with lowercase letters. 

e The name self is used to refer to the name of a class instance. 


TECHNICAL 
STUFF 


A brief privacy note 


When you're writing a class attribute, you specify that it is private by 
naming it with a single leading underscore. Unlike private attributes 
in modules and packages, however, Python doesn't treat private 
attributes in classes in any special way. The single leading underscore 
is just a convention that says to people reading your code, "don't 
touch this!" (That is, don't try to rewrite it, change it, or use it in any 
way, or else you might break something.) 


Private attributes are most often used in large class hierarchies (but 
you should use them any time you want to inform users of your class 
that this is not part of the public interface). 


If you name a class attribute with two leading underscores, that tells 
Python to use name mangling so that base classes and subclasses 
can't see the attribute (unless they manually do the name mangling 
themselves). This is an advanced programming tool because it's hard 
to use with inheritance. 


Attributes with two leading and two trailing underscores (like 
__getattr__) aren't private. They are special methods (discussed 
earlier in this chapter). 


Some OOP purists denigrate Python's privacy mechanisms because 
the privacy mechanisms don't enforce privacy. Pythonistas counter 
that anyone who really wants to break privacy can do it, and it's more 
important to document things—and then get on with your work. (In 
general, Pythonistas say, "Python is a language for consenting 
adults." That is, Python usually won't prevent you from doing 
something stupid or complex if you really want to.) 
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Inheriting the Farm: Overriding and Extending Classes 


You can change the features of a class or create new features by creating a subclass (or 
child class) that inherits attributes from its parent class and then changes some of the 
attributes inherited from the parent class. 


TECHNICAL Classes that subclasses inherit from are called parent classes, 
STUFF superclasses, or base classes. 


Creating a subclass 


To create a class that inherits from another class, you type class, the name of the 
subclass, and then, in parentheses, the name of the superclass the subclass inherits from. 
To create a subclass of the NameTag class we work with earlier, you use a class statement 
like this: 


class GeekNameTag (NameTag) : 
Tip Python 2.5 adds a new feature: classes without base classes can use empty 
parentheses (the same way functions can have no arguments). That is, you don't have 


to specify a superclass in the parentheses. For example, you could just type class 
C(): 


In general, when you create a subclass, you create only what you want to override or 
extend about the superclass's behavior. So you don't need to write an init () 
statement unless you want to make changes to the superclass's init _() statement. 


Overriding superclass methods 


To create a subclass that overrides a method of a superclass, you write a method with the 
same name as the superclass method you want to override, and you give this method 
different behavior. 


In the following example, the subclass defines a say() method that behaves differently 
from the superclass's say () method. 


class GeekNameTag (NameTag) : 
def say(self): 
print "The current value of my name is", self.myname 


Tip When not to use inheritance 


In some cases, writing a subclass that inherits from a superclass isn't the best 
solution. Suppose you are planning a conference, and you created a Registration 
class. You might think of making Registration inherit from NameTag so you can 
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use the say() method. But it's better to make a nametag attribute for the registration 
class that uses NameTag as a value, like this: 


class Registration: 
def init (self, name): 
self.name = name 
self.nametag = NameTag (name) 


Using an attribute that refers to an instance of a different class is called object 
composition. One way to describe what's going on in the above example is to say that 
an instance of the Registration class has an attribute that is an instance of the 
NameTag Class. To put it in material-world terms, if you produce a bunch of 
registration packets for a conference, each packet is like an instance of the 
Registration Class, and each package has a name tag in it, which is an instance of 
the Nametag class. 


Here's what happens when you create an instance of the class and the subclass and call 
each instance's say () method: 


SS> Ssupexclass _imgic = Neme lag (" Lineyy" ) 
PSS suloclass dims = CeekNemelacg (7 Lams ” ) 
PSS superclass ims, sayy (() 

Hello, my name is Lucy 

Pe> Suloclass_imsic, sary () 

The current value of my name is Linus 


Extending superclass methods 
There are two basic ways to extend superclass methods. You can 


e Extend a method that already exists in the superclass 
e Write anew method that uses data from a superclass method 


Extending an existing superclass method 

Extending an existing method of a superclass is similar to overriding, except that you 
also call the superclass method in the extended code. You extend a superclass method if 
you want the method to perform as written and you also want to manipulate its result in 
some way. 


To create a subclass that extends the methods of a superclass, follow these steps: 


1. Define a method with the same name as the superclass method you're 


extending. 
2. class WeirdGreeting(NameTag) : 
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def say(self): 


4. In that method's code, call the superclass method by typing the super-class 
name, a dot, the superclass method name, and its argument in parentheses. 

5. class WeirdGreeting (NameTag) : 

Ge def say(self): 

tie NameTag. say (self) 

8. Perform some additional manipulations, either on the value returned by the 


superclass method or on the subclass's instance data. 


The following subclass extends the say () method of NameTag: 


class WeirdGreeting (NameTag) : 
def say(self): 
NameTag.say (self) 
print self.myname.swapcase () 


You might call it like this: 


>>> greet = WeirdGreeting('Totoro') 
>>> greet.say() 

Hello, my name is Totoro 

tOTORO 


Adding a new method 


You write a new subclass method if you want the subclass to add some completely new 
behavior to a parent class. The following example creates a new subclass of NameTag and 
also creates a method on the subclass that extends the parent class by saving the NameTag 
data in a database. (Note: This code is incomplete; it works only if a database file has 
been created and opened.) 


class PersistentNameTag (NameTag) : 
def save(self): 
db.write(pickle.dumps (self) ) 


Using multiple inheritance 


A class with more than one superclass inherits its attributes from all its superclasses. Such 
a Class uses multiple inheritance. 


REMEMBER To write a subclass that uses multiple inheritance, type the names of more 
than one superclass in a class statement. For example: 


class SubClassName(SuperClassl, SuperClass2): 


And here's how you would create a "mutt" type of dog (this dog is probably easier to 
create as a Python class than in real life): 


class Heinz57Dog(Chihuahua, Greyhound, Mastiff, Dalmatian): 
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Namespace searching in classes and superclasses 


Subclasses have access to the namespaces of all the classes they inherit from. When a 
subclass inherits from parent classes, a namespace hierarchy is created. When other 
classes (not direct parents of the subclass) get into the act, the namespace structure is 
called a framework. 


When you are working with classes, Python searches for names in the following order 
and stops searching as soon as it finds the name: 


1. Instance 
2. Class 
3. Superclasses that the class inherits from 


Tip This is similar to how function and module namespaces work—Python searches the 
function's local namespace first, and then the module's global namespace, and then 
the main or built-in namespace. 


This gives rise to the following behaviors: 
Inheriting from a single superclass 


The names and definitions in a subclass or its instance override the same names and 
definitions in the superclass. 


Inheriting from multiple superclasses 


When a class has more than one superclass, Python searches first the instance, and then 
the subclass, as usual. When Python starts searching the superclasses, it uses a method 
called depth-first, left-to-right searching, which searches in the following order: 


1. Leftmost superclass in the class statement 


For example, in the class statement 
TotallyBizarreGreeeting (WeirdGreeting, NameTag), searching would 
begin with WeirdGreeting, the leftmost superclass named. 


Subclasses of the leftmost class 

Next class to the right in the class statement 
Subclasses of this class 

And so on... 


ge aoe 


Here is a simple example: 


class A: 
def foo(self): 
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jonealione ‘aevexe; ieteulileyel “alioy AN! 


class B: 
def foo(self): 
joreauinie MIEO@ Cebilevel aia Je) 


def bar(self): 
print "bar called in B" 


Gllsisis) (© (GA, eh) s 
pass 


YH 


If you import the preceding example into interactive mode, here's what you get when you 
create an instance of c and use it. 


1. The instance of c looks in class a first and finds the foo () method there. 


It executes that and never sees B's foo () method. 


2. When you call the bar () method, the instance of c doesn't find it in class A, so it 
looks in class B. 


This example shows the result: 


SSS x = CC) CLES Ain WAaSesMeS oir C 
SSS Sh 5 EOS (() Call the foo() method 
FOO) Ceillilerel akigh VN 

SSS sx. lovee ()) Call the bar() method 
bar called in B 


Operator interception and overloading 


You can change the way Python operators work in your classes. This applies both to 
expression operators (such as x * 3) and operations on objects (printing, calling a 
function, and so on). Inheritance is one trick that makes this possible, but Python also has 
some special tools to help: 


Hooking into the power of special methods 


Methods with names that begin and end with two underscores—special methods—are 
also called hooks. They have special behaviors. When you use an operator such as +, 
Python calls the special method __add__ behind the scenes. If you implement the 
__adda__() method for your class, Python will call it each time you use the + operator 
with instances of your class. 


Tip These methods are provided so you can make your own class instances work the 
same way Python's built-in types do. It's good practice not to change the behavior of 
the operators in ways users and readers of your program don't expect (such as 
changing _add__ so that it subtracts!). 
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Table 13-1 lists some of the operator overloading methods. To see them in the 
documentation for most of Python's built-in data types, type help (data type), 
substituting the name of the data type you need help with—for example, help (list). 


Method 


eq__(), 


Table 13-1: Some Commonly Used Special Methods 
=™>Open table as spreadsheet 


() and 


neq 


others 


_getattr_ () 


__getitem_ () 


— init () 
__iter_ () 


repr () 


__setattr_ () 


__setitem_ () 


str () 


Used For 


+ 


Comparing 


Looking up 


Retrieving an index or 


key 


Initializing objects 


Creates iterables 


Printing, converting 


Assigning a value to an 


attribute 


Assigning an index or 


key 


Converting objects to 


strings 


Syntax 


X == Y,X != y,andsoon 


X.attribute 


X[key], X[index],X in y, used in 
loops and tests 

MyClass () 

iter (X), for i in X: 

print X,X 


X.attr_name = value 


X[key] = value 
X{index] = value 
str (X) 


For a complete list of special methods, see "Special method names" in the Python 
Language Reference: http://www.docs.python.org/ref/specialnames.html. 


Super strings 


The following example class includes a method that overrides Python's builtin 


_ str. 


names in first name—last name order: 


class Emp] 
def 
sel 


oyee: 


__ imi (sels, 


f.lastname = 


lf.firstname 


sie (si@llie)) ¢ 
SGULIE . EsLieSieinel 


return "3s % 


ReEMIN SeILie 


lastname, 
lastname 
= firstname 


me: 


woo 
s J 


lastname 
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(self.firstname, 


() method. It checks whether one or two names were passed and returns the 


firstname=None): 


self.lastname) 


When you create instances of this class and print them, the behavior of str () changes. 
(The print command calls str (object).) 


>>> a = Employee('Aahz') 

>>> b = Employee('Maruch', 'Stef') 
SSS 6 = MMeuciela, Sireue! 

SSS joer’ a 

Aahz 
SSS jorimMeE lo 
Stef Maruch 
SSS joie © 
Maruch, Stef 


When to Go to Class 


You might benefit from creating classes in your program in the following situations: 
e Several functions use the same parameters. 


If your program's functions pass a lot of arguments back and forth, consider 
creating a class and making the functions into methods. Methods can access their 
data through the instance (se1f), so you don't have to pass the data. 


You could do this by using globals, but there's more of a danger that things would 
get changed when you don't want them to. Furthermore, globals don't allow you to 
create multiple instances. 


e Your program carries out the same operation multiple times. 


Classes are ideal for this purpose because each instance you create is separate 
from the others, and you don't have to reset everything to perform the operations 
again. 


e Your program is event-driven. 


An event-driven program often has a function that collects events (user actions) 
and sends them to other functions to be processed. If this event function is a 
method of an Application class, application information doesn't have to be 
passed as arguments. 


e Your program uses a persistent database. 
Class instances can be stored on disk by using the pickle or shelve modules. 
(See Chapter 19.) The pickle feature creates an external file and saves an object 


in it. pickle can also create a string suitable for storing in a database. The shelve 
feature "pickles" objects by using a database that you access with keys. 
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e You want a data type that bundles several items. 


An empty class definition can store data attributes. For more information, see 
Chapter 21. 


Chapter 14: Introducing New-Style 
Classes 


If you're new to Python and you have read Chapter 13 on classes, you might be groaning 
at the prospect of yet another chapter on classes. Actually, almost everything in Chapter 
13 also applies to the new-style classes we discuss in this chapter. New-style classes 
simplify Python's object hierarchy in ways that save you effort if you want to make 
changes to a built-in data type. They come with some new features that are of interest 
mostly in advanced programming. We discuss them here so you'll recognize them in 
programs. 


This chapter provides an introduction to new-style classes and their features. You also 
find out about method resolution order and two features you shouldn't use—static 
methods andthe slots ___ class attribute. 


An Object's Object: Intro to New-Style Classes 


The primary difference between a classic (or old-style) class and a new-style class is that 
a new-style class always inherits from object, either directly (class 

myclass (object) :) or indirectly by inheriting from one of Python's built-in types 
(class myclass (dict) £), which inherits from object. 


The original purpose of new-style classes was to allow subclasses based on built-in 
Python data types, such as dict and list. They also fix some problems with classic 
classes and add a bunch of new features (such as properties). 


New-style classes were introduced in Python 2.2; however, they were somewhat 
experimental in 2.2, and subtle but significant changes were made in Python 2.3. (In case 
you're curious, most of the changes involved multiple inheritance features.) If you're 
using Python 2.2, stick with classic classes. 

Classic Coke or New Coke? 
There is a controversy in the Python community about new-style classes. Some people 
say you should use them for all the classes in your code. Others say you should use them 


only where a regular class won't do. The controversy is based on the following facts: 


e Regular classes will go away in Python 3.0. 
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Not to worry—the vast majority of Python code written with regular classes will 
still work because classic class code isn't that different from new-style class code. 


e Python 2.1 and earlier don't support new-style classes at all, and new-style 
classes work differently in 2.2 than in 2.3 and later. 


So if you use new-style classes, your code might have problems running in 
Python 2.2 and earlier. 


We focus on classic classes in this book because they're still the default and will remain 
the default until Python 3.0; in addition, the vast majority of code in existence (including 
the Python library) still uses classic classes. 


Before Python 2.2, data types could be created only by using the C programming 
language, and classes couldn't inherit from built-in or third-party data types. If you 
wanted a new object that worked kind of like a Python dict, for example, you had to write 
the whole thing from scratch. You couldn't write a subclass based on dict because C types 
were very different from Python classes (specifically, classes were all of the class type, 
and class instances were all of the instance type). Python 2.2 introduced the type/class 
unification in the form of new-style classes. 


In the new-style class tutorial found at http://www.python.org, Guido Van Rossum 
(Python's creator) provides an example of a new-style class that extends the dict type by 
providing a default value when a key is missing: 


class defaultdict (dict): 
Get amie (selz, cetawile=Neme) s 
GHLGE, _alinalie —_— (si@ iit) 
self.default = default 


Cet  geticem (sel, ley) 3 
(EIeW 2 
wetuiem Gict. Ceiciliem (Sel#, ley) 
SRESIOIE INSEE OIE § 
return self.default 


The documentation for new-style classes hasn't yet been integrated with the rest of the 
official Python documentation at http://www.python.org. Here's where to find 
information about new-style classes: 

http://www.python.org/doc/newstyle/ 

Everything comes from object 


The basic rule for a new-style class is that it has to inherit from object. (We know what 
you're thinking: "Oh no, another use of the term object!") In this case, object is a built-in 
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type. All the Python built-in data types are really subclasses of object. If you type 
help(__builtin__) into the Python interpreter, part of what you see is a hierarchy of 
classes, starting with object: 


IiSllje) “yal leyblailic=aliel wmyeyelikes — JemialJlicaim 5 
NAME 


__ stniligdm = Biiilic=li HuUMAerIONS, Sxceorilems, amc! Otc OD iECrs. 
eee 
CLASSES 
object 
basestring 
SieIe 
unicode 
buffer 
eee il 


Methods of inheritance 


There are two ways to set up inheritance for a new-style class. You can inherit directly 
from object, or you can inherit from a built-in data type. 


Inheriting from object 
To write a class that inherits directly from object, start with this line: 
class MyClass (object): 


To make all classes in a single module new-style classes, put this statement at the top of 
the module: 


| MAcAClass = iEsfoe 
Inheriting from a built-in data type 
Your class can inherit from a built-in data type (which inherits from object). 


Use the following method if you want something that works a lot like a builtin data type 
but has a few differences (like Guido's defaultdict example). 


Warning Some functions return a specific data type. If you pass an instance of your 
subclass to a function that returns a specific data type, it may return an instance 
of a built-in type rather than an instance of your subclass. 


In this example, we create a subclass of 1ist and an instance of that subclass. When we 
use the append() method on the instance, the changed instance still belongs to the 
subclass we created. But when we use the 1ist() function with the instance, it returns an 
object of the List type. 
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>>> class MyList(list): pass 


>>> ell = MyList () 
>>> ell.append(1) 
>>> ell 
[i] 
>>> type (ell) 
Kelass "meso Willa sic 4S 
>>> ell2 list (ell) 

SSS eillilZ 

[it] 
>>> type (ell2) 
Swine / laigic lS 


New Improved Class Features 
New-style classes provide several enhancements over old-style classes: 


e The getattribute  () method 

e Anew return value for type () 

e The super () function 

e The property type 

e Class methods, including the new () class method 
e Static methods 

e The slots _ class attribute 


Getcher attributes! 


With classic classes, one of the tools of subclassing is the special method 
__getattr__(), which returns an attribute of an object. New-style classes add the 
special method getattribute  ().Thedifference between getattr_() and 
__getattribute () isthat = getattr_ () is called only if an attribute isn't found, but 
__getattribute () is called for all attributes. 


Hey, baby, what's your type? 


New-style classes give you more specific information about the instances you create from 
a Class: 


e Ifyou create an instance of an old-style class, the value returned by 
type (myinst) 1S <type ‘instance'>. 

e Ifyou create an instance of a new-style class, the value returned by 
type (myinst) is the same as the instance attribute myinst. class. 
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Calling the right superclass method 


The super () function provides a way to get at a subclass's superclasses when you are 
using multiple inheritance. You usually use it in a method to call a method in the 
superclass. To use it, type super (classname, objectname). It returns a bound superclass 
object. 


Here's how to use the super () function to re-create the example NameTag class code in 
Chapter 13. (Note that you must convert NameTag to a new-style class first.) The only 
difference is that you use super () instead of explicitly using the NameTag class. Notice 
that when you use super (), the class name argument must be the current class. 


Tip Using super () makes it easier to change the name of a parent class. You only need 
to change the class definition, not all the superclass calls in the methods. 


Extended method using (classic) NameTag superclass: 


class WeirdGreeting (NameTag) : 
def say(self): 
NameTag.say (self) 
print self.myname.swapcase () 


Extended method using the new-style NameTag class and super () function: 


class WeirdGreeting (NameTag) : 
def say(self): 
super (WeirdGreeting, self) .say() 
print self.myname.swapcase () 


The super () function has some limitations: 


¢ It is a cooperative function: Each parent class that defines the method must also 
call super (). 

¢ It works simply only when all methods have the same signature: That is, they 
must have the same parameter lists. (It is possible to use super () when methods 
have different signatures, but difficult.) 


The traditional way of calling a superclass's method is to use the name of the superclass 
you want when calling the method: 


x = MySuperclass.its_method() 
That's my property! 


Properties are attributes that you manage by using various get and set methods, so they 
are also called managed attributes. Before new-style classes came along, the only way to 
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customize attribute management was to override _getattr__and__setattr_. This 


was tricky and caused performance problems. 
Set up properties 


To set up properties in your class, follow these steps: 


1. Define a new-style class and an init () method. 
Usually, you set up a private attribute (self. x in this example) because it's only 
used behind the scenes by the methods. See Chapter 13 for more on private 
attributes. 
class MyClass (object): 
Gi limite (seit) 3 
sel. « = @ 
2. Define class methods that get, set, or delete an attribute. 
Tip It's a good idea for the names of these methods to include the words get, set, 
and del so it's clear what's going on. For example: 
def getx(self): 
joucsliane “MESiciealinves ec 
wecuiem Sel, x 
GlSHE SYeNES< (SKENINE,, 3) 8 
Wie & < O8 
raise ValueError ("Negative values prohibited: %r" % 
Xx) 
Sli, % = X 
jorealioe MSeiciesimey _ 5x 
3. Define a property and, in parentheses, pass the names of the methods you 


defined, as in the following example. 


Note that you do not call the methods (don't include parentheses after the method 
names). 


To include a doc attribute for a property (which shows up when you ask for help 
on the property), type doc= and a string inside the parentheses. 


x = property(getx, setx, doc="no negative x") 
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Use properties 


To use a property, create an instance of the class. To assign and retrieve values, pretend 
that the property is a regular attribute—Python automatically calls the methods for you. 
These examples show how: 


>>> my inst 
>>> my inst 
Ceiresling 3x 
0 
SS> iihy inst. = 10 
Setting x 
PSS TOO = wih LIMNSIC 5 x 
Getting x 
SoS too le 
10 
PrS joaine MyClass.  Gloe 
no negative x 
SSS ihy imsicox = 10 
Traceback (most recent call last): 

Walle YVsreehtias, Ilatine iL, <aie. 2 

File "a.py", line 11, in setx 
raise ValueError ("Negative values prohibited: Sr" % x) 
ValueError: Negative values prohibited: -10 


= MyClass () 
358 


fer <a 


When only class matters 


Class methods are passed the class of the object but not the object itself. They're useful 
when 


e A method needs to know only the class name and doesn't care about any 
underlying data. 

e You call methods directly on the class and you want to return an instance (for 
example, the dict. fromkeys () method). 


The following sections describe creating and calling class methods. You also find out 
about the special new  () method. 


Creating class methods 


You put class methods in your classes by defining a method (call it g() ) and then typing 
the code g = classmethod(g). 


In this example, the g() class method returns the name of the class and a value x that it's 
passed: 


class E(object): 
Clair ef(ClS, xx) 8 
ieercbicin CS,  MeMne 4p 
g = classmethod(g) 
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Tip Another way to create class methods is to use decorator syntax, like this: 


@classmethod 
def g(cls, x): 


We tell you more about decorators in Chapter 16. 


TECHNICAL Python programmers conventionally use the name cls (or 

STUFF occasionally klass) to refer to the name of the class that gets passed 
to a class method. (These weird names are used because class itself 
is a keyword and can't be used as the name of an object.) This is 
similar to the convention of using self to refer to the instance that a 
class creates. 


Calling class methods 
To call a class method, use the class name, a dot, and the method name. 


This example calls the class method g() from class E created by the preceding class 
definition. 


SS> joreiliaic. We (3) 
(ay SY, 


You can also call a class method on an instance, like this: 


SSS joueslinie 138) (()} se (3) 
(ay pS) 


What's new in class methods 

The new ___() class method deserves special mention. 

When you create an instance with an old-style class, Python first creates the instance and 
then calls the init _() method to initialize it. But this doesn't work well for 
immutable instances or with metaclasses. So new-style classes have a class method, 
__new__ (), that returns the instance. You can define both new  () and init  () in 


a class. If you define both, init  () iscalledafter new (). 


Classes that use new _() are different in several ways from classes that only use 


ei ome 
e The new () method creates a new instance, so it takes a cls parameter. 


An init () method takes a self parameter because it is called after the 
instance is created. 
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e The new () method must call its base class's new _() method to create the 


instance. 
With an init _() method, you can either call the base class's init () or 
override it. 
e The new () method must explicitly return the created instance via a return 
statement. 
e An init () method doesn't return anything. 
Here's a trivial example ofa new _() method: 


Glass cleiciyec! trem (Sivic) s 
clesc mew (CIS, “ARGS, ~leweuces) 3 
IMSEEMOS = Site, mew (derived teem, “aigs, ““lawasxc@s) 


return instance 


Unless you need to have a__new__() method in your class, it's conventional to use 

__ init _ () to process the passed-in arguments and then raise an exception if they are 
incorrect. It's a Bad Idea to use new __() to return an error value rather than a correctly 
created instance. (It's okay for new ___ to raise an exception, though.) 


Cutting through the static about methods 


Static methods are basically functions attached to a class rather than to a module. The 
difference between a static method and a class method is that a static method has no cls 
parameter. When a static method is called in a program, it looks just like a function call. 


A static method begins with a def statement, has no self or cls parameter, and ends 
with name_of method= staticmethod(name_of method). The line @staticmethod 
before the def statement is decorator syntax that also indicates a static method. See 
Chapter 16 for more information about decorators. 


Warning Don't use static methods; they add unnecessary complexity to your programs. 
Use functions if you want to do things with classes that don't require a class 
instance. 

Don't use the slots machine 

The slots __ class attribute provides a way of restricting the attributes of class 

instances. It's intended as an advanced memory-saving technique for classes that will 

have millions (yes, millions) of instances. It doesn't work well with inheritance. 


Warning Simple rule: If you don't understand metaclasses, don'tuse__slots_. 


If our warning isn't enough, take Guido's word for it: 
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slots __ isa terrible hack with nasty, hard-to-fathom side effects that should only be 
used by programmers at grandmaster and wizard levels. Unfortunately it has gained an 
enormous undeserved popularity amongst the novices and apprentices, who should know 
better than to use this magic incantation casually. 


Island of Dr. MRO 


In Python, MRO doesn't stand for Meals Ready to Open; it stands for Method Resolution 
Order. 


Tip The important thing to remember about MRO is this: If you use new-style classes, 
you don't have to worry about method resolution order, because Python handles it 
automatically. 


MRO matters when you write a class that inherits from more than one parent class. For 
example, assume your class inherits from two parent classes, and each parent class has a 
write() method. If your class calls a write () method, Python looks for it first in the 
class itself and then in the parent classes. But how will Python know which write () 
method to use? And more important, how will you know which one gets used? If you 
guess wrong, you might get the dreaded Unexpected Results. 


With classic classes (see Chapter 13), Python looks for the method in this way: 


1. It looks in the parent class listed first (leftmost) in the class definition. 

2. It looks at that class's parent classes, and so on until there are no more parent 
classes. 

3. It goes to the next class listed in the class definition, and all of its parent classes, 
and so on. 


TECHNICAL This is called depth first and then left to right order. 
STUFF 


Unfortunately, this order doesn't work with new-style classes, because they all inherit 
from object. If this order were used, the same parent class would be examined multiple 
times, and you'd still be in the position of not knowing which class's method was 
discovered first. Even worse, sometimes a parent class is supposed to be overridden by a 
subclass, but the parent class's method gets called instead! 


So the way MRO works in new-style classes is this: 
Anew-style class hasan __mro___ attribute that prescribes the order for searching parent 


classes. It is created by a complicated algorithm that guarantees that each parent class 
exists only once inthe = mro__ 
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Best of all, Python generates a class's _mro___attribute automatically, so you don't have 
to worry about the "complicated algorithm!" Just examine the _mro___attribute to see in 
what order the superclasses are searched. 


Tip The algorithm does still work left-to-right, so if you want a particular super-class to 
get its method called first, make sure that it's listed first in the classes you inherit 
from. 


Exploding Your Head with Metaclasses 


Python 2.2 and later have a functionality called metaclasses. Just as you use a class to 
create instances, you use a metaclass to create classes. 


Roles 


REMEMBER A metaclass is just a class—but it creates classes rather than class 
instances. In other words, it really is turtles all the way down. 


A metaclass doesn't really create the class. Instead, Python executes the class code (just 
like initializing a module) and then passes the class to the metaclass. The metaclass can 
do whatever it likes, and the class name is given to what it returns. 


Tip The Python community's wisdom about metaclasses is summed up by Python guru 
Tim Peters: "Metaclasses are deeper magic than 99% of users should ever worry 
about. If you wonder whether you need them, you don't." Therefore, read this section 
to find out what a metaclass is and how to recognize one when you see it, but don't 
worry about how to put one into your own program. 


In a metaclass, the first argument for a method is conventionally called cis rather than 
self. Metaclass methods operate on the class that the metaclass creates. A metaclass 
doesn't create instances, and if you try to create an instance of a metaclass, you get an 
error that looks like this: 


Traceback (most recent call last): 
TypeError: unbound method [...] 


The default metaclass for all new-style classes is type (). (Surprise! type is really a 
metaclass instead of the function we've been calling it all along. It just does a good job of 
disguising itself until you need to use its other capabilities.) 

Applications 


To use a metaclass, use the _metaclass__ name either 


e Inthe body of a class 
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e At the top of a module (if you want all the classes in that module to use that 
metaclass). 


Here's an example that sets up a class to use a metaclass. Notice that you only refer to 
MyMetaClass; you don't call it (there are no parentheses). 


class MyClass: 
_ Wwecacilass = MyMeceiC ileus) 


Even if you want to write code that creates multiple classes, you rarely need to use 
metaclasses. A function can return a class. A function that returns a class is called a class 
factory. An example of a class factory called class with method () can be found in this 
article on metaclass programming in Python: 


http://www.-128.ibm.com/developerworks/linux/library/l-pymeta.html 


The classobj() function of the new module also acts as a class factory. 


Chapter 15: Feeling Exceptional 


People who use your Python programs don't always do exactly what you want them to 
do, and sometimes a program gets input it doesn't expect. Python has built-in tools for 
handling these situations. 


This chapter describes some of Python's error-handling features, including exceptions and 
code blocks you write to capture exceptions. We also introduce you to writing your own 
exception handlers and discuss two ways of approaching error handling. 


All about Special Handling 


In Python, as in English, an exception is a special case—something that needs special 
handling. When the Python interpreter encounters something unexpected, it generates an 
internal message, or raises an exception. You write code to tell Python what to do when 
an exception occurs. Python's built-in exception code usually displays an on-screen error 
message containing information about what kind of problem occurred and where. 


Python also uses exceptions for control flow in non-error conditions, especially in for 
loops, which we discuss in Chapter 10. 


"T didn't think they would do that!" 


When you write programs for other people to use, especially programs that ask users to 
provide input of some kind, users may give your program input it doesn't expect. 
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For example, if your program asks the user to input a number, she might accidentally type 
a letter instead. If your code is set up to work only with a number, this input would cause 
your code to fail. Proper exception handling catches input of the wrong type, prints a 
friendly reminder about what input to use, and lets the user try again. That way, the user 
can correct the problem easily instead of having to restart the program or look at a 
confusing error message such as this: 


TypeError: cannot concatenate 'str' and 
‘int' objects 


Here's an example of Python's error handling. If you forget a colon when writing a for 
statement, Python prints an error message called SyntaxError: 


SSS fOr K lin rainge (5) 
Walle, Weshecliins! Ilsliney 11 
for x in range(5) 


SyntaxError: invalid syntax 


This exception handling functionality makes debugging and using programs easier. 
Without it, a program would just mysteriously quit whenever it encountered a bug. 
You've probably used programs that do that and know how frustrating it can be! 


But Python takes exception handling even further: You can write your own exception 
handlers based on Python's built-in ones. For example, if you don't want an error message 
displayed to the screen, you can design your program to catch specific errors your users 
might make and then write exceptions that give users hints about how to use your 
program correctly. 


Trying Things Out 


Python programmers handle exceptions in a code block that begins with the word try. 
Blocks that use try are often nested inside each other to provide complete error handling. 
There are two kinds of try blocks: 


e try/except: If exceptions occur in the try code, the except clause captures and 

handles the exceptions. 

e try/finally: The try code and the finally code always run. If exceptions 
occur in the try code, the finally clause re-raises the exceptions for handling 
elsewhere in your program. 


In Python 2.5, these two types can be combined into one block: 
try/except/else/finally. See "try/except/else/finally: Together at last," later in this 
chapter. 


The following sections describe the inner workings of try blocks. 
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Using try/except statements 


You use a try/except block if you want the except code to run only if there is an 
exception. It looks like this: 


jedeyy7 2 
some code 

except SomeException: 
exception-handling code 


Tracking down tracebacks 


Python keeps track of what your program does. Each time you call a function, Python 
adds the call to the call stack. Each function call adds another layer to the call stack. 
When an exception is raised, Python stores the call stack as a traceback that tells you 
exactly where the error occurred and the sequence of calls that got you there. (It doesn't, 
however, list the function parameters—you need to do your own logging for that.) 


The following example program contains an error. When you run it, Python raises an 
exception, which generates a traceback: 


### Three functions that call each other 
def a(): 
b() 
def b(): 
c() 
def c(): 
1/0 # Attempt at division by 
zero, will raise exception 
### Run function a() 


a() 


When you run the program, the traceback lists the calls in the order they were received: 


-py", line 9, in <module> 


"a.py", line 3, ina 


"a.py", line 7, inc 

# Attempt at division by 
zero, will raise exception 
ZeroDivisionError: integer division or 
modulo by zero 


a 
) 
e 
) 
File "a.py", line 5, in b 
) 
e 
0 


It works as follows: 


1. Python runs the code in the try clause. 
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If no exception is found, Python finishes the try block and doesn't run the code in 
the except block. 

If an exception occurs in the try clause, Python skips the rest of the try code and 
checks the exception against the exceptions named in the except statement. If 
there is a match, Python runs the except clause. 

If an exception occurs in the try clause that doesn't match the except statement, 
Python looks for a matching exception handler in any code that the try block is 
nested in. If Python doesn't find a matching exception handler, then Python uses 
its built-in exception handler (that is, it prints those geekspeak error messages you 
saw earlier). 


Handling multiple exceptions 


You can process several different kinds of exceptions by using try/except code. If you 


want to 


catch more than one kind of exception in a try/except block, there are two 


ways to do it: 


To handle all the exceptions the same way, list multiple exception classes 
within a single except clause. 


Pass the exceptions as a tuple (in parentheses, separated by commas), like this: 


except (RuntimeError, TypeError, NameError): 


To handle each exception a different way, use multiple except clauses. 


These clauses are tested in order; if there's a match in a clause, the rest of the 
clauses are skipped. Here's what two except clauses might look like: 


except RuntimeError: 
some code 
SSSI Iyjoelsieioe 8 


some different code 


Following is a small program that checks for correct user input. It uses a try/except 
block with a single except clause that catches two kinds of exceptions. 


y 

pr 
SACS 

pr 


raw input ("En up) 
alia. (3) 
dime “Wee muller was” 
(TypeError, Value 

ime “avlaeic elaclia Ye 


ter an integer: 


y 
lieie@@ue)) 8 


Processing exception arguments 


Some e 


xceptions use arguments (associated data, like what functions and methods use): 
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e Exceptions usually have a string argument describing what sort of problem 
occurred. Here's an example: 

Cee taye: 

° x = int (x) 

OO QRS. Wh7OSlaie wore 2 

e raise TypeError("%r is not a valid integer" % x) 

e A few exceptions also have a tuple argument that stores several bits of 
information about the problem. This is an advanced feature we don't cover in this 


book. 


To print or log the details of an exception, specify a name in the except statement to hold 
the exception instance. In the following example, the name detail is used to store the 
exception instance: 


SSS Wieye 
x = 1/0 
. except ZeroDivisionError, detail: 
jrcmiat VWOeas,, cleicaaiil 


Oops, integer division or modulo by zero 

Warning If you use more than one exception class in your except statement, put them in 
parentheses to differentiate from the exception instance name. Otherwise you'll 
get an unexpected result. 


Dealing with exceptional exceptions 


When using multiple except clauses, set up the last clause to catch any unexpected 
exceptions. A "catch-all" except clause (commonly called a bare except) should go at 
the highest level of your program. It should include code that logs the errors it catches. (It 
could also print the error and send it to a logging function and then re-raise the 
exception.) Logging is covered in Chapter 17. 


Warning Include code in your except clause that logs or prints the errors the except 
clause catches. Otherwise, Python swallows all the unexpected exceptions and 
you won't know what kinds of errors your program is really generating, which 
makes debugging harder. 

When it comes to "or else" 


A try/except block can have an else clause. The else clause runs only if the try block 
runs successfully and completely: 


e No exception is raised. 
e No return or other block-exiting statement is executed. 


Use an else clause for code that needs to run if the try clause doesn't raise an exception. 
Put it after all the except clauses, like this: 
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EIeWy ¢ 
some code 
except SomeException: 
exception-handling code 
else: 
necessary code 


Using try/finally statements 


If you need some code that a/ways runs as part of a try block, and you want to handle 
exceptions elsewhere in your program, use the try/finally block. 


The finally block is good for cleaning up—closing open files and releasing memory, 
for example. (Python usually does these things for you automatically, but in programs 
handling lots of data, it's useful to clean up as you go along.) 


TECHNICAL The try/finally block follows these steps: 
STUFF 
1. Python runs the code in the try block. 
2. If Python finds no errors, it leaves the try block and runs the 
code in the finally block. 
3. Ifa return, break, Or continue statement is executed in the 
try block, the finally clause runs on the way out. 


4. Ifan exception occurs in the try block, Python skips the rest 
of the block but saves the exception. It runs the finally 
clause and then sends the exception up to higher levels of 
your program (that is, it re-raises the exception), where you 
should have exception-handling code. 


A try/finally block looks like this: 


EIEN 8 
some code 
ialin@ Lily 
cleanup code 
Warning When you use a finally clause, don't put initializing statements (for example, 
assignment statements) in the try block. In the following code, if open () raises 
an exception, the name f doesn't actually get created. Then when the finally 
clause runs, Python will produce a NameError complaining that ¢ doesn't exist. 


ilesianyas 
£ = open (Cheese lisic”) 
process (f) # This line is pseudo-code 
finally: 

£.close() 


To avoid the problem, put the open () line before the try/finally block, like this: 


i = open (cheese lisiz”) 
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jEIEW 8 

process (f) # This line is pseudo-code 
iEalinall Lys 

£.close() 


Because it's usually a good idea to do error-checking whenever you try to open a file, you 
might want to wrap the open () line in a separate try/except statement. 


try/except/else/finally: Together at last 


Starting in Python 2.5, you can write a single block that includes all the components 
discussed in the previous sections. This code works the same way as the old code did, and 
the guidelines for how to use the statements are the same—the only difference is that you 
don't have to nest two blocks if you want to use both except statements and a finally 
statement together. 


The following example uses try, except, and finally together to create a function that 
tries to read a file and return its contents. But if there's an error reading the file, the 
function returns None instead. Whether or not the file read is successful, the function 
closes the file and returns a value: 


eli Saute _iweerel (sei llemenns)) g 
f = open(filename) 
ese We 

data 
exeejoe. ILO 

data 
Eainall ys 

£f.close() 
return data 


f.read() 
IEOIE S 
None 


I = tl 


It's easier to ask forgiveness than permission 


Early in the history of programming, errors were often handled by trying to guess every 
error that might occur and then writing a lot of if statements to process them. This 
philosophy of error handling is called "Look Before You Leap" (LBYL for short). This is 
expensive in computer processing power because the error checking occurs every time 
the user does something (enters data, clicks something on the screen, chooses from a 
menu, and so on). 


The try statement lets you assume that users will use your program correctly. It handles 
errors only when they actually occur. This philosophy of error handling is called "Easier 
to Ask Forgiveness than Permission" (EAFP for short). Props to Alex Martelli for the 
phrasing and tireless promotion of the philosophy, and to Admiral Grace Murray-Hopper, 
architect of Cobol, who brought the concept into computers (albeit in a different context 
—she was commenting on a techie's relations with bureaucrats). 


EAFP is more efficient than LBYL in several ways: 
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e You can handle all the errors in one place (the except statements). 

e You can handle unexpected errors by using an except statement with a "catch-all" 
error handler. 

e You handle errors only when they occur and not check for errors every time the 
program gets some input. 


Raising Your Code to New Levels 


When your program determines an error condition (such as depositing a negative amount 
to a bank account), the raise statement lets you force an exception even if Python 
doesn't detect an error. Raising an exception "pushes the error higher" in your program's 
code hierarchy. 


To write a basic raise statement, type the word raise and the type of exception to raise. 
This example raises the ValueError exception: 


raise ValueError 


To include a value with an exception, specify the value in parentheses after the exception. 
The value is usually a string that gives a message about the error, as in this example: 


raise NameError('What's your name again?') 


You might also see this code written in the old style, with a comma rather than 
parentheses (for example, raise NameError, ‘What's your name again?'), but 
parentheses are preferred, and the comma form won't be valid in Python 3.0 and later. 


TECHNICAL A raise statement can also include a traceback object, but we don't 
STUFF cover that in this book. 


Sometimes, when Python has already raised an exception, you might want to do 
something with it now and also force higher levels of the program to deal with it (in 
geekspeak, this is known as "allowing the exception to propagate up the calling stack"). 
To save, or re-raise, an exception, type raise ona line by itself. For example, if you want 
to send an exception to a logging function and re-raise it, your code might look 
something like this: 


ileslanyas 
some code 

except ValueError: 
log_problem () # replace this with your logging code 
raise 


REMEMBER Use a bare raise only when there is an existing exception—otherwise you 
get an exception! 
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Making Your Program Exceptional 


Python's exceptions are pretty general. Creating your own exceptions simplifies your 
error handling. It's much easier for users and other programmers to understand your code 
when you use an exception that describes what the error is, both at the point where the 
error occurs and when you handle it in an except clause. 


To create your own exceptions, write a new exception class or class hierarchy that's 
derived from Python's Exception class or one of its subclasses. 


Writing a base class for your exceptions 
Here are some tips for writing an exception class: 


e Use asa base class the Python exception that most closely resembles the 
exception you're writing. 

¢ Most exceptions have names that end in Error to maintain consistency with 
Python's built-in error handling. Yours should, too. 

e Keep exception classes simple. They should mostly only include attributes that 
can store information about the error. Separate handlers can extract this 
information and act on it. 


For example, to define a class based on Python's Exception class, write something like 
this: 


class MyModuleError (Exception): 
pass 


Developing an exception hierarchy 


If you're creating more than one kind of exception—perhaps as part of a module that 
handles several related functions—good practice is to start by creating a base class for the 
exceptions. Usually, the base class doesn't do anything. The code looks something like 
this: 


class MyModuleError (Exception): 
"""Base class for my exceptions.""" 
pass 


To make a subclass based on the base class, follow these steps: 


1. Name the subclass after its corresponding module or class, plus "Error", like 
this: 

Z, Glass Linowiewseweie 
TECHNICAL This helps you keep track of which modules your errors are 
STUFF coming from. 
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3. Put the base class in parentheses in the class definition, and type a colon. 


4. class Input 
5. On the next 
exception to 


6. class Inpu 
Wes pass 


do. 


tError (MyModuleError): 


Error (MyModuleError) : 
line, indent four spaces, and write code for what you want the 


Chapter 16: Tackling Some Advanced 


Features 


This chapter describes some of the advanced features of Python, as well as some of the 
new features of Python 2.5. 


If you don't understand parts of this chapter, don't worry. We're describing these things 
mainly so you will recognize them if you see them in someone else's code. As you work 
with Python, your understanding will improve until you're ready to use these features 


yourself! 


What's That Idiom? 


Table 16-1 shows you how some of the advanced and new features look in Python code. 


All these features are described later in this chapter. 


Table 16-1: Recognizing Advanced Python Coding Idioms 
=™Open table as spreadsheet 


Keyword or 
Example 


yield ina function 
or method 


with exp as name: 


X if C else Y 
@ 
lambda a, b: 


ax*b 


Feature 


List comprehension 
(listcomp) 


Generator expression 
(genexp) 
Generator 


with statement 


Conditional expression 
Decorator 


Lambda expression or 
anonymous function 
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Description 


Creates a new list; allows 
manipulation of list items) 


Creates a new iterator object; more 
efficient than a listcomp 


Makes a function that remembers 
where it left off 


Adds context management to 
try/finally blocks) 


Easier truth testing 
Creates function wrapper 


An expression that returns a function 


What to Do Next: Iterators and Generators 


Lists are a useful Python feature that can take up a lot of memory and processing time. 
Iterators are tools that help make lists more efficient, and generators make creating 
iterators easy. 


The itertools library 


The itertools library, which was introduced in Python 2.3, includes functions for 
making lists more efficient both to write and to use. Each of the itertools functions 
creates a different type of iterator object. The objects are designed to work together. They 
also work with the tools in the operator module to produce high-speed results. 


Here are some simple examples of itertools code. 


This itertools.imap() code takes two lists and passes their elements (one element 
from each list at a time) to a lambda function (see the "Mary had a little lambda" section, 
later in this chapter), which multiplies the elements: 


PSS m= lll, 3S; G, 4, 3, ©, 3, 234) 


SS> im = il, 2, Sp Sp Gp Vo Bp DI 
SSS for 1 iim deerrools .imejo ((lameda =, ws sev), im, im) 3 
[Neale aby 


1618 20 18 42 24 2106 
The inner lives of iterators 


Iterators are used throughout Python, and if you understand them, you will have a good 
grasp on what Python is all about. 


An iterator is a Python object that produces elements one at a time. An iterable is an 
object that can create an iterator. For example, the iterator created from a list returns the 
list elements one at a time, in order. The iterator for a dict returns the keys one at a time, 
unordered. Lists and dicts are examples of iterables. 


As if things weren't confusing enough, some iterables (such as files) are also iterators. 
For example, a file simply returns itself when asked for an iterator (technically, it returns 
self). This means that all iterators are also iterables. 


Finally, you can create functions and classes that either create iterators or are themselves 
iterators. 


Iterators are useful because they simplify code and reduce memory usage. They can also 
make applications more responsive because they do only the calculations needed to 
produce one element at a time instead of doing the work of producing all the elements 
upfront. 
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The official Python tutorial available on the Python Web site at http://www.python.org 
contains further explanation of the inner workings of iterators: 


Behind the scenes, the for statement calls iter() on the container object. The function 
returns an iterator object that defines the method next() which accesses elements in the 
container one at a time. When there are no more elements, next() raises a StopIteration 
exception which tells the for loop to terminate. This example shows how it all works: 


>>> s = 'abc' 
>>> it = iter(s) 
SSS ate 


<iterator object at O0x00A1DB50> 
>>> it.next () 
a! 
>>> it.next() 
b! 
>>> it.next () 
c! 
>>> it.next () 
Traceback (most recent call last): 
File "<stdin>", line 1, in <module> 
StopIteration 


This itertools.count () code creates an iterator that returns consecutive integers 
beginning with the argument you specify (0 by default). You can use 

itertools.count () with itertools.imap() in the preceding example to generate one 
of the lists, like this: 


SS> m= Il, Sp Gp Gr 3p G, Sp 234) 
SSS Ole 1 lim teerrools ume ((lemecd x, ws ses), m, Meerecools.cowme ()) 3 
(ICI aL, 


© 3 12 12 12 30 18 Less 


The itertools.islice() function works on a slice of a list of numbers. This code 
creates a list of consecutive numbers between 0 and 61 (xrange (62) ). It then takes a 
slice of that list (the argument 6 specifies the beginning of the slice and the argument 62 
specifies the end) and prints every third number from the slice (the argument 3 specifies 
every third number): 


SSS ele i Win WeerroOols 1slice (sdeeinege(62), 6, G2, 3): 
jpiealiMie aL, 


© 9 12 15 is 2 24 27 30 33 36 39 42 45 48 5 54 57 60 


Generators: yield for faster processing 


Generators look like functions—in fact, they are a special kind of function. They use the 
yield statement to return data (whereas regular functions use the return statement). 
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Generators are a convenient way to create iterators because they simplify the record- 
keeping needed for returning each element. (That is, you don't have to keep checking and 
reassigning values because the generator remembers the values it was assigned last.) 


Generators are different from functions in two other ways: 


Calling a generator always returns the same thing: an iterator (usually called a 
generator iterator or geniter). 


Calling a function can return any data type, depending on what the function does. 
When Python encounters a yield statement, it doesn't exit the generator iterator. 
Instead, it remembers what was happening inside the function (including any 


values that were assigned). 


When Python encounters a return statement, it exits the function, and values 
assigned in the function are forgotten. 


Because they make iterators, generators are usually used in loops. Here's how a generator 


works: 

1. When you call a generator, you get a new iterator. 

2. When you call the next () method of the generator iterator, the body of the 
generator function executes until it hits a yield statement. 

The for statement calls an iterator's next () method automatically. 

3. The yield statement passes a new value to the next () method. The generator 
iterator doesn't exit. Instead, it preserves or remembers the state of the function 
body. 

4. The next () method gets called again. Because the iterator remembers the state of 
the function, the values that were in effect before, remain in effect. 

5. The cycle repeats until Python runs off the end of the generator, a return 
statement is executed, or an exception is raised. 

6. Ifthe generator ends normally, Python raises StopIteration. 

TECHNICAL Generators are sometimes called resumable functions or generator 
STUFF functions. People who call them generator functions usually call the 


thing that they return a generator rather than a generator iterator. 
(Python can add to your stock of tongue twisters as well as help you 
become a better programmer!) 


The following code creates a generator function that filters out repeated items in a 
sequence—for example, repeated lines in a file. You prepare the file by reading it using 
the readlines() method and then sorting the resulting list of strings: 
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def uniq(input): 
prev = object () 
Fore Ibaimey shim aliajotic 
if line != prev: 
yield line 
prev = line 
Tip In Python 2.4 and earlier, you can't use yield in the try clause of a try/finally 
statement. 
TECHNICAL Python 2.5 adds some esoteric features to generators. (We list them 
STUFF here so you won't be surprised if you see them in other people's 
programs.) All three features are new methods on geniters: 


e send() allows you to pass values back into the generator. 
A consequence of this change is that yield becomes an 
expression in Python 2.5, so code like this is allowed (in 
earlier versions, it raises an error): 
foo = (yield foo) 

Starting in Python 2.5, generators are also called coroutines. 
¢ throw() raises an exception inside the generator. 

¢ close() forces the generator to exit. 


Expression and Comprehension: Listcomps and 
Genexps 


List comprehensions (listcomps) and generator expressions (genexps) let you focus on 
data rather than control structures. (See also the "Focusing on Functions" section, later in 
this chapter.) They replace a for loop with an expression. 


List comprehension and generator expression code can look intimidating if you don't 
know what's going on. But after you get used to them, you'll probably find them a lot 
easier and quicker to use than the multi-line statements they replace. Remember that you 
never have to use a listcomp or a genexp, but a lot of programmers use them, and it's best 
to have a basic understanding of how they work. 


List comprehensions 
A listcomp is a nifty shorthand way of creating a new list. You can create a new list in 


other ways, such as by using the map () function, but the listcomp syntax is more flexible, 
simpler to create, and often easier to read. 
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A list comprehension is enclosed in square brackets and always includes the keywords 
for and in. 


As an example, if you have a list of strings and you want to make sure they are all 
lowercase, you could do something like this: 


nivllasic = PUAN’, UtOOY, Voie] 
ems = [I] 

Oe alin My lalsie 3 
tmp.append (x. lower () ) 
mylist = tmp 


That's a lot of code for a simple task! (Well, Java programmers might not think it's a lot 
of code, but Pythonistas are spoiled.) Plus, anyone reading the code would have to read 
the whole for loop carefully to understand that it was creating a new list. 


Fortunately, you can use a listcomp instead. It does the exact same thing as the for loop, 
but it uses only one line of code. 


Creating a simple listcomp 
To create a simple listcomp, follow these steps: 


1. Type a name, =, and [ 
Do my iliisic = | 
3. Type the action you want done to create each list item. 


For example, if you have a list of strings and you want to make sure they are all 
lowercase, you might use the lower () method of lists. Your line of code would 
now look like this: 


mylist = [x.lower() 


Type for and the same name you used in the action. 
mylist = [x.lower() for x 


Type za the name you typed at the beginning of the line, and | 


mylist x.lower() for x in mylist] 

REMEMBER You should use a listcomp rather than a for loop when you want to focus 
on creating a list. That way, someone reading the code doesn't have to 
examine the for loop to find the purpose of the code. 


~~ Bey > 


Creating a listcomp that applies a filter 


You've seen that you can use a listcomp to do something to each item in a list. Another 
thing you can do is apply filters to the items in a list to build a new list. The following 
listcomp produces a list of numbers divisible by 5 by using the filter if x % 5 == 
Note that it uses the expression range (50) rather than a list name. 


243 


pes willlieerecd list = [x tox x im remge(50) tt x = 5 == O] 

SSS willieeireel lat sie 

IO, 5S, 10, 15, 20, 25, 30, 35, 40, 45] 

Tip Listcomps can get complex rather quickly. If your code is getting hard to read, use a 
for loop instead: 


tiltersel list = |] 
for x in range(50): 
ise 2 4 5 == Os 
ieiileeieSel Mal sic . eysissiael ((s) 


Generator expressions 


In the earlier section, "Generators: yield for faster processing," we discuss generators, 
which are a special kind of function that makes processing in loops more efficient. 
Python 2.4 introduces generator expressions. Generator expressions (genexps for short) 
are a shorthand way of creating generator iterators. 


Genexps look like listcomps, but they use parentheses rather than brackets. (However, if 
the generator expression is the only argument of a function, you can skip the 
parentheses.) 


The difference between a genexp and a listcomp is: 


e A genexp creates a generator iterator (also called a geniter). 
e A listcomp creates a list. 


Tip One way to use a listcomp or a genexp is to give it as the argument to a function that 
takes an iterable. For example, a genexp is a handy way of building a dictionary by 
using the dict () constructor. 


To write a simple genexp that creates a dict in which each number is associated with its 
square, follow these steps: 


1. Type the name of a function and (. 


In this example we use the dict () function, which creates a dictionary out of 
two-item tuples. 


elaLene (( 


Type (, a name for one item in the tuple, and a comma. 

olbeie ( (es, 

Type an expression for the other item in the tuple, and then type ) 
dict ((x, x ** 2) 


Type for and the name of the first item in the tuple. 
Galois (6k, ox ee Bi) adope sr 
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8. Type in and a tuple representing the values you want to assign to the first 


item in the tuple. Then type ) 
OS. Cicie((x, x ss 2) ioe s< aha (2, Zi, (3)))) 


If you typed the above code into interactive mode, you'd get this result: 


SSS Clee (Gk, sk we 2) aro x alin (2, 4, ©))) 

(28 Gp 48 IG, 6s 36) 

Tip If the expression would evaluate to a tuple, it must be in parentheses. (This also 
applies to listcomps. 


With What, Your Bare Hands? (The Power of ‘with’ 
Statements) 


In Python 2.5, the new with statement makes try/finally statements more powerful 
and easier to use. (Maybe even powerful enough to slay a fierce green dragon!) The 
syntax is as follows: 


with EXPRESSION as NAME: 
BLOCK 


The with statement works like this: 


¢ EXPRESSION returns a value that the with statement uses to create a context (a 
special kind of wrapper object). This is similar to the way a for loop 
automatically creates an iterator. The context is used to execute the BLocK. The 
block might end normally, get terminated by a break or return, or raise an 
exception. No matter which of those things happens, the context contains code to 
clean up after the block. 

e The as NAME part is optional. If you include it, you can use NAME in your BLOCK. 
NAME is similar to the target of a for loop (that is, the i in for i in mylist). 


To use the with statement in Python 2.5, put this code in your module: 


ICOM UNE WNES aS WLLElM, Sivelicemneimic 
TECHNICAL In Python 2.6 and later, the with keyword will automatically be 
STUFF recognized. 


The protocol used by the with statement is called the context management protocol, and 
objects implementing it are context managers. 


You can create a context manager with a decorated generator (see the later section, 
"Decorating Your Code"), like this: 


from contextlib import contextmanager 
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@contextmanager 

def opened(filename, mode="r"): 

fF = open(filename, mode) 

ieieys 

yield f 

finally: 
£.close() 


You might use the preceding code as follows: 


with opened("/etc/passwd") as f: 
Oe Aas SLi jee 
joi@sinte Laine), weSibie ajo) ()) 


TECHNICAL This canonical (standard) example is from the Python docs at 
STUFF http://www.python.org. The code creates a template for opening a file 
that ensures the file is closed when Python leaves the with block. 


Making Exceptions for Yourself 


The way you use standard exceptions and write custom exceptions changes slightly in 
Python 2.5. Here's the scoop: 


e Python 2.5 turns all standard exceptions into new-style classes (see Chapter 14 for 
more about new-style classes) and introduces a new superclass called 
BaseException. All the standard exceptions you'll use in your programs still 
inherit from the Exception class, but the Exception class in turn is a subclass of 


BaseException. 
e In Python 1.5 through 2.4, all standard exceptions are subclasses of the 
Exception Class, but any (classic) class can be raised as an exception. 


TECHNICAL Two exceptions inherit from BaseException: KeyboardInterrupt 
STUFF and SystemExit. But you have no business catching those 
exceptions. You should let them behave the way they're supposed to 
so that the user can interrupt your program. 
REMEMBER Any custom exceptions you write should be subclasses of Exception. 
(See Chapter 15 to find out how to write custom exceptions.) Although 
this rule isn't enforced now, it will be enforced in Python 3.0. 
Warning Early versions of Python (prior to 1.5) used string exceptions. Although Python 
still lets you create string exceptions, they won't be supported in the future. In 
Python 2.5 and later, string exceptions raise a PendingDeprecationWarning. 
Use them only in the (extremely unusual) case of code that needs to be 
compatible with versions earlier than 1.5. 


Under One Condition 
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Conditional expressions were added to Python 2.5 to simplify choosing one of two 
options in an expression. (See Chapter 10.) 


A conditional expression takes the following form: 


M at C else ¥ 
It works like this: 


1. cis evaluated. 

2. Action depends on whether C is true: 
o Ifc is true, then x is evaluated to give the result. 
o IfC isn't true, y is evaluated to give the result. 


You can use parentheses in conditional expressions. Because parentheses identify the 
conditional expression as a single unit, they make it easier to read in context of other 
code, as in this example: 


x = ("tiger" if fruit=="redcurrant" else "crocodile") 


Although you can write nested conditional expressions, you shouldn't because they are 
hard to read. The following example is too hard to read as a single statement. It should be 
written as an if block instead: 


# too-complex conditional expression 

cleese = ("Hungarian tourist" if sketch == "Phrasebook Sketch" else 
("Customer in search of parrot" if sketch == "Pet Shop Sketch" else 
"Fresh fruit sergeant") 


) 


# easier-to-read if block 


if sketch == "Phrasebook Sketch": 

cleese = "Hungarian tourist" 
elif sketch == "Pet Shop Sketch": 

cleese = "Customer in search of parrot" 
else: 

cleese = "Fresh fruit sergeant" 


Decorating Your Code 


Decorator syntax provides a way of modifying functions and methods. A decorator (also 
called a function decorator or function wrapper) is a function that takes a function as an 
argument and returns another function. 
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TECHNICAL Before decorator syntax made its way into Python, programmers 
STUFF modified functions and methods by calling the decorator function and 
assigning the result back to the original name of the function. 


A standard use for decorators is to implement a cache that stores the results of function 
calls. This is called memoizing. It works like this: 


1. The first time you call the memoized function, the wrapper calls the original 
function to get the result. It returns the result but also stores it in the cache. 

2. When the program calls the memoized function again with the same arguments, 
the stored value is returned. 


This speeds up the program because the original function doesn't run again. 


If you wrote a decorator function called memoize(), you would set it up to work with a 
function by using the following syntax. The @ character indicates a decorator. The 
decorator comes on the line before the function or method definition: 


@memoize 
def myfunc(myarg): 
some code 


Without decorator syntax, you write the code this way: 


def myfunc(myarg): 
some code 
myfunc = memoize (myfunc) 


REMEMBER When you pass myfunc as an argument to memoize() in the preceding 
code, myfunc doesn't have parentheses after it. That is because you aren't 
calling myfunc() as a function; you are passing the function itself as an 
argument. 


Some memoizing decorator functions can be found in the Python Cookbook: 


http://www.aspn.activestate.com/ASPN/Cookbook 
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Focusing on Functions 


Several Python tools are designed specifically to work with functional programming, 
which is a kind of programming that works via functions. 


Genexps, listcomps, and the itertools module are functional programming tools 
covered earlier in this chapter. This section introduces some more esoteric tools, mostly 
so you can recognize them in other people's programs. 


Tip If you made an easy A in calculus, you might enjoy functional programming. If you 
never made it to the calculus level, don't worry. The more complex types of 
functional programming aren't necessary for most types of programming tasks. 


Python's functional programming tools include these features: 


¢ lambda expressions create functions you can define in other function calls (this is 
useful if you don't want to create a whole separate function). 
¢ Higher-order functions take functions as input and output a function. 
Tip Decorators are an example of higher-order functions. 
e First-class functions include functions that can be either 
o Stored in a data structure (for example, using the pickle module) 
o Passed as an argument to another function 
e List comprehensions, which we discuss earlier in this chapter 
e The map(), reduce(), zip(), and filter () functions 


Mary had a little lambda 


The lambda keyword creates anonymous functions—functions that don't have names. 
(The name /ambda comes from LISP programming.) It creates a function object the same 
way the def keyword does, but the lambda function object has <1lambda> in place of the 
name of a function, as shown. 


SSS llemlocla. x2 x55 iS © 
<function <lambda> at 0x8211844> 


This code uses the above lambda expression to make a list of the numbers from 0 to 75 
that are divisible by 5: 


>>> filter(lambda x: x%5 is 0, xrange(76) ) 
[O; Sy 10, AS, 20, 25, 30; 35, 40, 45, 50, SS, 60, 65, FO, 75] 


A couple of other differences between lambda functions and functions created with the 
def keyword are as follows: 
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e A lambda is an expression, not a statement, and it can appear anywhere that an 
expression is allowed. 

e A lambda expression can't have statements in it (such as x = 2, or if or try 
statements). 


Mapping it out 


Another functional programming tool is the map () function. In its simplest form, using 
None as the first argument, it turns a sequence into a list, or multiple lists into a single list 
of tuples. 


In the following example, the map () function turns a string into a list of single-character 
strings: 


>>> mystring = "hello kitty" 
>>> map(None, mystring) 
lays ewan rule Yala Urey ' Lae Tikal Vales sete ears by! || 


e Ifyou pass a function as the first argument, map () performs the function's action 
on each element of the list or sequence and stores the result in another list. 

e Ifyou pass more than one sequence, the function needs a parameter for each 
sequence. 

e Ifthe sequences have different lengths, map () substitutes None when there is no 
corresponding value in the shorter sequence. 


Tip If you don't need this sequence-extending behavior, you should use zip (), listcomps, 
or genexps rather than map (). They're simpler. 


Applying filters 
The filter() function takes a function (which can be a lambda) and a sequence. It 


processes each element using the function and returns a sequence of the items for which 
the result of the function is True. For example: 


SSS ceice = 10, 2, Tr Sl 
SSS ial leer Clamloga x3 x5, cleice) 
[1LO0, 7H] 


Tip You can also filter elements with listecomps. See "Creating a listcomp that applies a 
filter," earlier in this chapter. 


Reductionism 


The reduce () function takes a function (which can be a lambda) and a sequence. It 
returns a single value constructed as follows: 


1. It calls the function by using the first two items of the sequence as arguments and 
returns the result. 
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2. It calls the function by using the result just returned and the next item of the 
sequence. 


3. And so on until it runs out of items. 


This example performs x * y ona list of 4 items: 


SSS miviaiset = (2, 4, G, sl 
>>> reduce(lambda x, y: x*y, mylist) 
384 


Tip For most purposes, reduce () has been replaced by the built-in function sum(). The 
sum() function takes a sequence of numbers and adds them, as shown: 


>>> sum(range (20) ) 
190) 
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Chapter 17: Using Python's Primary 
Services 


In this chapter, you find out about many of the tools built into every installation of 
Python. 


We call this the "primary services" chapter because almost every non-trivial program uses 
some (but not all) of the features listed in this chapter. 


Python: Batteries Included 


The tools that come with Python are collectively called the standard library. The standard 
library contains commands, modules, and packages. 


TECHNICAL Sometimes you'll see modules or packages that come with Python 

STUFF referred to as libraries, which is short for standard library module. 
This book uses module and package, though. Chapter 12 explains the 
difference between modules and packages. 


The standard library is like a basic education for Python, allowing it to work with 
different operating systems (UNIX, Windows, and Mac OS, for example), to handle 
many kinds of data, to navigate the Internet, to work with different human language 


systems, and so on. 


There are so many features that people often say that Python comes with "batteries 
included." 


REMEMBER Remember to import modules 
Python's abilities come packaged in modules so you don't have to think 
about tools you aren't using right now. To use a tool, you usually have to 


import the module it's stored in. 


The recommended way to import modules is to import the whole module 
by typing 


import modulename 


We describe other ways to import tools in Chapter 12. 


REMEMBER This key point will save you many many many hours of coding: 
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Before you start writing a Python program to perform a function, check 
whether the function already exists in Python's libraries. Hundreds of 
programmers have developed these tools so you don't have to. 


Each new version includes lots of feature upgrades and additions. So when you upgrade 
to a new version of Python, be sure to read the "What's New" Web page for the version at 
http://www.python.org/. The new features of Python 2.5 are available here: 


http://www.docs.python.org/ 


You Get All This!—The _ builtin. Module 


The tools inthe builtin module are available whenever you use Python. You don't 
have to import them. To see what's in the module, type dir(__builtins __). (Don't 
forget the ‘s’—although the module is called builtin__, the alias for the module is 
builtins .Ifthat bothers you, you can type import _ builtin to use the actual 
module name.) 


If you don't recognize most of the names, that's okay. A lot of what's in builtin _ 
works behind the scenes, at least when you're just starting out. Curious? Get a description 


of a particular item in the library by typing help (name of item). 


Most of the builtin __ tools that you need to know about are described elsewhere in 
this book. This section describes a few additional tools you might find useful. 


Seeing what's inside Python objects 


The dir() function returns an alphabetized list of an object's attributes, such as functions 
or methods, constants, and the name of the object. 


Here's how to use the dir () function: 
e To see the names in the current namespace, type dir(). 


If you type dir () immediately after starting Python's interactive mode, you will 
see something like this: 


>>> dir() 
[ burltins: "> doc. — “y- * name." ] 
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To see all the attributes (named items) belonging to an object, type dir (name of 
object). For example: 

>>> dir (ArithmeticError) 

[PE SCULASIS =  SO evebencicies ar We yolaltehy bs NGheyols 

U_ G§@eacrteloure  ", 


Un sepeneabieretae on ee Savsticlen eh Uo sabigyaie, eee igvenipe ie 
Weeducea "sn 

Y___ weCMIeS Ox ", 9 ieeyue ", 9 S@icencer 7, 
v__ s@tsicace 

ste ", 9 wmicodes ", "awgs", message! | 


To see the attributes of a specific attribute, separate the object name and the 

attribute name with a dot, like this: 

chiic (AriiclimiSicieiiecoe, Coe _) 

Tip You can also see some attributes by typing help with the name in 
parentheses, for example: 


>>> help (ArithmeticError) 


The dir() function shows you these types of attributes: 


For a module, it returns the module's attributes. 

For a type or class object, it returns the object's attributes and those of its base 
classes. 

For any other object, it returns the object's attributes and the attributes of its class 
and base classes. 


Reading and writing files 


The open() function sets up a file object in Python. It returns a file data type. 


Warning The built-in documentation for open () for Python 2.2 through 2.4 has an error. 


It states (or implies) that because open () is an alias for file (), you should use 
file(). This is incorrect. Use open () rather than file() so that your code will 
work with all versions of Python. 


The following sections describe most of the argument options and the methods of the 
open () function. 


Options for the open() function 


Here are the rules and options for open (): 


You should give a name to the file you're opening, or else it will be difficult to do 
anything with it. 
The first item in the parentheses is a string containing the filename. If the file 


belongs in a particular directory, type the path to the directory. For example: 
"my dir/my file' 
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The se 


cond item in parentheses is a string representing the mode. If you leave this 


out, the mode defaults to 'r' for "read-only." The modes are described in Table 


17-1. 


Mode 


'rt 


Table 17-1: Modes of the open() Function 
=™ Open table as spreadsheet 


Meaning 
Open the file as read-only (allowing no changes). This is the default. 
Open the file for writing. Create the file if it doesn't already exist. 


Warning Using 'w' erases anything currently in the file. 
Open the file for appending. Create the file if it doesn't already exist. 
Treat the file's contents as binary. 


This code is used mostly with Windows. Most other OSes these days don't 
differentiate between text and binary files. 

Allow both reading and writing. Add it to the 'r' or 'w' mode like this: 

' rt ' 

Use universal newline support; for reading text files created on Mac, UNIX, 
or Windows. Works only with 'r'; doesn't work with ‘ +'. Type it on its 
own or add it to 'r' mode like this: 'ru' 


So to open a file for reading, type code like this: 


myfil 


n(filename, 'r') 


= op 


Saving changes to disk 


To save file changes to disk, you need to use one or both of these methods: 


The flush () method forces Python to take data in the internal buffer and save it 
to the file. Use it when you want to write changes to disk without closing the file. 
The close() method closes the file object that is created with the open () 
method. You can't read from or write to the file after you close it. Use it when 


you're 


finished with the file. 


Reading file contents 


You can read the contents of a file in several ways. Which one you use depends on the 
format you want for the file's contents: 


For a string, use the read() method. It returns the whole file or up to a specified 
number of bytes as a string. This example returns 27 bytes: 
>>> myquote = open('quotes.txt') 
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e >>> myquote.read (27) 

e 'Tf you like laws and sausag' 

¢ Fora list of strings (each string is one line of the file), use the read-lines () 
method. The following example reads the whole file. 

e >>> myquote.readlines () 


° ['If you like laws and sausages, you should \n', ‘never watch 

e ither one being made. \n', '-- Prince Otto von Bismarck 
(aicieienisy, )) Vin” , 

*  "\n!, “Not to mention web sites. \n¥, "=— SJM\n"] 


e Fora string representing each line, use a for loop—which is also best if you want 
to read a file piecemeal rather than in a big gulp. 
e >>> for line in myquote: 


e ae jouesiate Laine), eS ie ie aly) (()} 


e If you like laws and sausages, you should 
e never watch either one being made. 


e -- Prince Otto von Bismarck (attrib.) 


e Not to mention web sites. 
e -- SUM 
Tip In the preceding example, the string method rstrip() removes white space 
and the end-of-line character (\n) from the strings. Because print also 
appends \n, if you don't use rstrip (), the output will be double-spaced. 


Writing information to a file 


The write () method writes a specified string to the file. (The file must have been opened 
for writing or appending.) 


Tip Information written to a file may not appear in the disk copy of the file until you 
have flushed the buffer and closed the file. 


This example creates a file object (pithy sayings) and writes a string to it: 


>>> pithy sayings = open('myquote', 'w') 

>>> mystring = "Entropy isn't what it used to be." 

>>> pithy sayings.write(mystring) 

>>> pithy sayings.close() 

TECHNICAL File objects don't support Unicode strings (see Chapter 6). To write a 

STUFF Unicode string to a file, you need to encode it as UTF-8 first, like 
this: 


myfile.write (mystring.encode('utf-8') ) 

Tip If you have a file (or file-like object) open for writing, you can use the >> operator 
with the print statement to redirect output to the file. The print >> statement is 
useful for logging or to make sure that a command-line program prints output to the 
screen when stdout is redirected elsewhere, like this: 
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print >> sys.stderr, "Danger! Danger!" 
Working with attributes 


To examine, specify, and change an object's attributes indirectly, you use the hasattr(), 
getattr(),and setattr() functions. One use for these functions is converting user 
input into an attribute of a class. 


¢ To find out whether an object has a particular attribute, type hasattr and then, in 
parentheses, specify the name of the object and the attribute name as a string 
(separated by commas). 

O SSS iMeseliricie (mete, cosa! )) 

OAS 


The result is True if the string matches the name of one of the object's attributes 
and False if not. 


e To find out the value of an object's attribute, use getattr() with the object name 
and attribute name (again, as a string). If the attribute exists, Python returns its 
value. 

e >>> getattr(math, 'pi') 

OO 6S, IAIS QZ OSS dDS9 79S 


If the attribute doesn't exist, Python raises an AttributeError by default. But 
you can specify a third argument to the function: a value you want returned if the 
attribute isn't found. This value is often None. 


>>> x = getattr(math, 'mary poppins', None) 
Po JOIGLIMIE 3x 
None 


Tip An easier way to get the value of an attribute is by typing the object name, a 
dot, and the attribute name: math. pi. This returns an AttributeError if the 
attribute doesn't exist, so if you want to return a value instead, use 
getattr(). 


e To assign a value to an object's attribute or to create a new attribute and assign a 
value at the same time, use the setattr() function. 


In the parentheses, type the object name, the attribute name as a string, and the 
attribute's value, like this: 


setattr(myobject, 'my attribute', 'my value') 


If the attribute exists, the new value is assigned to it. If the attribute doesn't exist, 
Python creates it and assigns the new value. 
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Finding largest and smallest items 


To find the largest item, use the max) function. In the parentheses, specify a sequence, 
like this: 


SS er =) (ibe SS, Se Wl 


>>> max (a) 
7 


To find the largest of several items, specify them all as arguments to the max () function. 
Separate arguments with commas. Here's an example: 

SS seh = ee Se Sin 7 

SSS lo = 100, 200, Sod] 


SSS mles< (4, 19) 
[L00, 200, SOO] 


The min () function works the same way but returns the smallest item or argument. 
Getting input from users 
The raw input () and input () functions read information from Python's standard input 


file (sys.stdin). You usually use them to collect input that a user types in response to a 
prompt or question. 


raw_input() 

The raw input () function prints a prompt string to Python's standard output file, 
sys.stdout. (By default it writes output to the screen.) The function returns the user's 
input as a string. (The prompt string is optional, but you'll usually want to include one.) 
You specify the prompt string as the argument, like this: 

raw_input ("How old are you? ") 

input() 


The input () function expects the user to enter a valid Python expression and then it 
evaluates that expression. 


Unlike raw input (), the input () function can return any valid Python object, not just a 
string. 


Warning It's safer to use raw_input () and have your program convert the string into a 


Python expression. Because the input () function calls eval (), it could be used 
to run malicious code. (See "Evaluating a string," later in this chapter.) 
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If input () doesn't receive a valid Python expression, it raises an exception, as in this 
example, in which the user input is pi: 


SoS alinjouie (Wayne als; 3}, 14S?) 

iilevenc als) 3} 4S) 2 foul 

Traceback (most recent call last): 
Hille V<ssechins”, ilaime i, aia <mechniles 
File "<string>", line 1, in <module> 

NameError: name 'pi' is not defined 


Finding an object's type 


To find an object's data type, use the type () function. In parentheses, give it the name of 
the object. It returns the object's data type—for example, string or function. This example 
returns the type of the function round(). 


>>> type (round) 
Ki7oe Vowullliciin imeem Ore WSieleycl! > 


Reloading a module 


The reload() function allows you to re-import a module without quitting and restarting 
Python. Reloading is useful when debugging a program that takes a long time to start up, 
when programming a GUI, or when Python is embedded in another application. 


Warning The reload() function should never be used in a live application because it 
does not modify existing references to the contents of a module. 


Evaluating a string 


The eval () function takes a Python expression in the form of a string, converts it into a 
code object, runs the code, and returns the result. You use it when you want to give your 
users direct access to Python. 


Warning The eval () function is a security risk, so if your code needs to be secure, it's 

better to use some other approach (for example, setattr() ora dictionary). 
TECHNICAL The eval () function also works with a code object, but that's an 
STUFF advanced feature we won't be documenting in this book. 


This example uses eval () to solve an expression. (You wouldn't use this code in a real 
program, though—you'd simply type print x+1!) 


>>> x = l 
SS> jorealine Cyvelll (Yar il!) 
2 


You can use either of these optional arguments with the eval () function. They go after 
the string. You must pass them in this order: 
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A dictionary of global names, used as the global namespace for evaluating the 
expression. This defaults to the current global namespace's dictionary. 

A dictionary of local names, used as the local namespace for evaluating the 
expression. 


But Wait, There's More—The sys Module 


The sys module is for interacting with and controlling the Python interpreter. It includes 
information about the operating system and the version of Python you're using. Most of 
the time, you don't interact directly with what's in the sys module, but you should know 
about a few tools it offers. 


REMEMBER You must import the sys module before you use it. 


import sys 


To work with a feature in the sys module, type sys, a dot, and the name: 


sys.exit () 


These are the items in the sys module you'll probably use most often: 


sys.exit() function: Tells the Python interpreter to quit. It's the most direct way 
to end a Python script while it's running. 

sys.argv list: Stores any command-line arguments passed when you started 
Python. Also includes the name of the program you're running. 


Use sys.argv when writing utility scripts that accept parameters on the command 
line. This example shows the contents of sys.argv after you run a program called 
add. py and pass it the parameter 3. 

% jOwelacim oil acleljoy 3 
>>> import sys 

>>> sys.argv 
['add.py', '3'] 


The sys module also contains the following types of objects: 


Input, output, and error files 
o stdin: Used by the input () and raw input () functions; accepts input 
from user 
o stdout: Used by the print statement; in interactive mode, prints to the 
screen 
o stderr: Stores error messages 
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¢ Objects that store information specific to your version of Python. Examples 
include pathnames, the list of available modules, and character encodings. 
Functions beginning with get and set let you access and change this information. 
¢ Debugging and exception-handling tools. 


Solving OS Incompatibility—The os and subprocess 
Modules 


The os and subprocess modules include code that lets Python work with your operating 
system—they even run operating system commands. 


These modules give Python ways of accessing the operating system that don't require you 
to write code specific to a particular operating system. Therefore, you should import and 
use them if you're writing a program that needs to work on more than one operating 
system. 


The os module is best for the following tasks: 


e Working with paths and permissions (test for access to a path, changing 
directories, changing access permissions and user/group IDs) 
e Working with files (open, close, write, truncate, create links) 


Tip Another module that's useful for copying and moving files is shut il (which 
stands for "shell utility"). See Chapter 19. 


The subprocess module, new in Python 2.4, lets you safely interact with the operating 
system to run commands and get information out of them. 


Think of the os and subprocess modules as the way that Python does the same things 
you do with your operating system directly (such as starting programs and manipulating 
files). If you work on more than one operating system, you must remember different 
commands and syntax for filenames, for example. Python's os module helps Python 
remember these things. For example, the os.path module handles pathnames, so you 
don't have to write special code to handle both UNIX pathnames (which have forward 
slashes) and Windows pathnames (which have backslashes). 


Working with the os module 


Tip This section lists some useful os functions and data. To find out more about the 
abilities of the os module, type help (os). 
REMEMBER Except where indicated in the following sections, all the tools of the os 
module work with UNIX, Mac, and Windows operating systems. 
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Manipulating directories and their files 
Use these functions to work with directories and files on a computer system. 


* os.chdir() changes the current working directory to the specified path. Specify 
the path as a string, like this: 

e os.chdir('/users/stef/Quote') 

* os.getcwd() returns the current working directory as a string, like so: 

e 6 6>>> os.getcwd() 

e '/users/stef/Quote' 


¢ os.mkdir() creates a directory with the path you specify. Specify the path as a 
string, as follows: 
e os.mkdir('MyNewDictory') 


To set a directory's permissions, specify a four-digit octal permissions mode as the 
second argument. The default mode is usually 0777, which gives everyone full 
access to the directory, but your operating system might override this default. 
Here's how to specify the default: 


os.mkdir('MyNewDictory', mode=0777) 


* os.makedirs() works like os.mkdir(), but it creates directories recursively. For 
example, if you're in directory A and you want to create directory B that holds 
directory C that holds directory D, type this: 

e os.makedirs('B/C/D') 

Warning os.makedirs() doesn't work with Windows Universal Naming 
Convention (UNC) paths, which use the \\host\path format. 

* os.remove() deletes the item on the path specified. Specify the path as a string. It 
doesn't remove directories. In Windows it raises an error if the file is in use. The 
code looks like this: 

e os.remove('R/test') 

* os.rmdir() removes a single directory on the path specified, and 
os.removedirs() removes multiple directories. We recommend using 
shutil.rmtree() rather than these functions. See Chapter 19. 

* os.walk() creates a generator object that lists the files in a directory tree. It 
generates a three-item tuple (directory path, directory names, filenames) and 
examines the directory in top-down (default) or bottom-up order. To make and use 
the generator object, write code like this: 

OQ SS> ior ioe, cles, iTillSss sin Oswell (emoeires”)) s 


e Pee eLMe woor, chirs, ines 
You use three ames in the loop because the generator makes a three-item tuple. 


* os.tmpfile() creates a new temporary file. The file is opened for writing and is 
binary ("w+b"). It's automatically deleted when you close it. 
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Tip The tempfile module provides more tools for creating temporary files; use 
that instead of os.tmpfile() if you want more than the basics. 


Wandering down the os.path 


The os. path module is a submodule of os. It contains tools that handle filenames so that 
your program can work on multiple platforms. 


Tip We recommend that you import the whole os module when you need to use os. path. 
If you don't, Murphy's Law dictates that you'll need something else in the os module 
and forget you didn't import it and get an annoying error. 

TECHNICAL Many of the os.path functions don't work with Windows Universal 

STUFF Naming Convention (UNC) paths, which use the \\host\path 

format. 


Descriptions of some important os.path tools follow. All these tools take pathnames as 
arguments. Use strings to specify pathnames. 


* os.path.join() is one of the most important os.path tools. It takes one or more 
paths and joins them by using the current operating system's path separator. If any 
component is an absolute path, all previous components are thrown away. Here's 
an example: 

e >>> os.path.join('users', 'stef', 'Quote') 


On UNIX this produces: 


"users/stef/Quote' 


On Windows it produces: 


"users\\stef\\Quote' 

Tip On Windows, os.path.join("c:", "foo") creates a path relative to the 
current directory on drive c: instead of creating c:\\foo. To get an absolute 
path on Windows, use this syntax: 


OS peli jOim (Tes\\", Vico") 


* os.path.abspath() takes a relative pathname and returns the corresponding 
absolute pathname, like so: 

QO SS) OS /Oeielm . aOSOSiela (Y OuIerES ¥ ) 

e '/users/stef/Quote/' 

¢ os.path.normpath() converts path names in nonstandard formats to standard 
format. For example: 

e >>> os.path.normpath('stef//./Quote') 

e 'stef/Quote' 

* os.path.split() takes a pathname and returns it in two parts: the directory part 
and the filename. 
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>>> os.path.split('/users/stef/Quote/quote2.new') 
('/users/stef/Quote', 'quote2.new') 


If you specify a directory, it returns the path and an empty string, like this: 


>>> os.path.split('/users/stef/Quote/") 
(" /fusers/stat/Ouorz"”, '") 


Tip On Windows systems using UNC, use os.path.splitunc() instead. 


os.path.split () has two related functions: 


os.path.dirname() returns the directory name of the path (the first half 
of the pair that os. split () returns). 

os.path.basename () returns the filename (the second half of the 
os.split() pair). If no file was specified, the os.path.basename () 
function returns an empty string. 


os.path.splitext () (which stands for "split extension") splits the pathname 
into a pair of pathnames: 


12) 


{@) 


OE © © 


(2) 


The first part is the pathname, including the part of the filename before the 
extension. 

The second part includes a period and the file's extension. 

>>> os.path.splitext ('/users/stef/Quote/quote2.txt') 

(" uses / See /OUOca/cuece2), Y exe”) 

TECHNICAL If there is no extension or no file, the second part is an 
STUFF empty string. 


This example puts os.path.split() and os.path.splitext () together: 
>>> dirname, filename = 
os.path.split('users/stef/Quote/quote2.txt") 

>>> basename, extension = os.path.splitext (filename) 

>>> dirname, filename 
('users/stef/Quote', 'quote2.txt') 
>>> basename, extension 

(Vepone2!,  Y sioxe)) 


os.path.exists() returns True if the specified path exists; otherwise, it returns 
False. It also returns False for broken symbolic links. 

os.path.isdir() returns True if the specified path is an existing directory. It 
follows symbolic links. 


Seeing environment variables 


The os module includes a dictionary, os .environ, that stores your operating system's 
environment variables (which contain user and system preferences). The variables and 
their values are stored as strings. The dictionary is set up when you start Python. If you 
change your environment variables outside Python, the dictionary doesn't reflect the 
changes until you restart Python. 


To see the value of an environment variable, type os.environ and then, in square 
brackets, type the name of the variable in quotation marks, like this: 
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>>> os.environ['HOME"] 
0 IBS / SESE 


Warning Although it's possible to change environment variables by changing the 
os.environ dictionary, it's not safe to do so. Use the Popen() function of the 
subprocess module to change environment variables instead. (The changes 
affect only the subprocesses.) 


Subprocessing 


The subprocess module, new in Python 2.4, lets you interact with the operating system 
to create new processes, pass information into and out of them, and get their return codes. 
It replaces a number of functions and modules available elsewhere in Python, including 
the following: 


os.system 
os.spawn 
OS.popen 
popen2 
commands 


The subprocess module is safer than the preceding methods of starting new processes 
because its default is to create a new process directly rather than creating them through 
the shell. That means that, with subprocess, you can safely pass shell special characters. 


There are two ways to use the subprocess module: by creating a Popen instance and by 
using the cal1() function. The Popen method is more complicated, so we only document 
the call() function here. 


Introducing the call(Q) function 


The function subprocess.call() is a shortcut for one of the more common uses of the 
subprocess module. It runs a command, waits for the command to finish, and then 
returns its return code. This simple example lists the contents of a directory. The 0 after 
the directory listing is a return code—a status message that the subprocess sends. On 
UNIX, 0 indicates that the subprocess completed successfully. 


SSS Suloprocess. esl (Vile, ‘Mayadas ]] )} 

copyy-Py helloworld.pyc mystuff.zip Sie WIE , Zajo 
copyy-pyc myspider.py pagecount.py SWUEIEZ . wifo) 
helloworld.py myspider.pyc pagecount.pyc x 

) 


Useful calling parameters 


Here are some of the more useful parameters of the cal1() function: 
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¢ args, a string or sequence of program arguments. The program is the first item in 
the sequence, and the other items are any command-line arguments the program 
takes. 


In the above example, args is a list. The program is '1s' and the command-line 
argument is 'MyDir'. 


TECHNICAL On Windows, args is converted to a string; the converted 
STUFF value may not work with all Windows applications. (It works 
with applications that support the rules of MS C runtime.) 


¢ shell=True specifies to run the command through the shell. The default shell is 
sh (UNIX) or whatever is specified by COMSPEC (Windows). To specify a 
different shell, use the executable parameter. 

e env defines environment variables for the subprocess. These only apply to the 
subprocess; they don't change any environment variables defined in your main 
shell. 

e stdin, stdout, and stderr define how data and error messages are transmitted to 
and from the child process. Values for these parameters include PIPE, an existing 
file descriptor (a positive integer), an existing file object, or None. PIPE creates a 
new pipe to the child. None specifies not to redirect messages (the child's file 
handles are inherited from the parent). To send error messages to the same place 
as other output, set stderr=STDOUT. 

¢ cwd specifies a directory to change to before the child process runs. 


Trying an OS command with call() 


This example code tries to run a command using cal1() and prints information about the 
success or failure of the attempt. The name ret code stands for the return code that the 
command sends. To use this try block in your own code, just fill in your command and 
argument in place of "mycmd" + "myarg": 


jenny. 
retcode = subprocess.call("mycmd" + " myarg", shell=True) 
iit PECCOCEe < Os 
print >>sys.stderr, "Child was terminated by signal", -retcode 
else: 
print >>sys.stderr, "Child returned", retcod 
GE'S OSLLLOK, Se 
print >>sys.stderr, "Execution failed:", e 
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Staying on Time with the datetime and time Modules 


Computers usually store time as the number of seconds that have passed since some 
specific date (called the epoch). The exact date varies depending on the operating system. 
The UNIX operating system usually uses January 1, 1970. 


Humans don't think of dates in terms of large numbers of seconds, so one of the functions 
of the datetime and time libraries is to convert this machine time into date and time 
information you can use. 


Python has two primary modules with date and time tools: 


e The datetime module includes tools for working with dates, times, and 
combinations thereof. It supports several different calendar systems and ways of 
presenting time zone information (but we don't cover time zones in this book). 

e The time module includes tools for working with times and dates in the recent 
past to near future. It focuses mostly on manipulating time based on the 
computer's internal representations of time. For example, the time module 
includes a sleep () method, which lets you pause execution of your program 
while the computer counts down for you. 


Using the datetime module 
When you import the datetime module, you have available a datetime () object that 
stores both date and time information. You can also work with dates and times separately 


by using date() objects and time() objects. They work much the same way as the 
datetime () object, so we don't cover them here. 


Components of a datetime() object 


An object that stores date and time information looks like this: 


datetime.datetime(1969, 7, 20, 22, 56) 


Here's more information about the components: 


e datetime.datetime is the type of object. 
e The arguments are integers, separated by commas, in this order: 
o Required arguments: year, month, and day of month 
o Optional arguments (which default to 0): hours, minutes, seconds, 
microseconds. Another optional argument is time zone information, which 
defaults to None. (Using time zones in Python is an advanced feature 
beyond the scope of this book.) 
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The preceding object leaves out seconds, microseconds, and time zone 
information. 


When you specify optional arguments, you must specify all arguments down to 
the level of precision you need by typing a number for any arguments to the left 
of the ones you want to specify. It's okay to leave out arguments to the right. For 
example, to specify no hours and 59 minutes, you type 0 for the hours and 59 for 
the minutes. You can leave out the seconds. The code looks like this: 


datetime.datetime(1961, 12, 31, 0, 59) 
Formatting date and time information 


Python stores a datetime () object in exactly the same way you entered it: 


>>> moonwalk = datetime.datetime (1969, 7, 20, 22, 56, OQ) 
>>> moonwalk 
datetime.datetime(1969, 7, 20, 22, 56) 


If you want the date and time in another format, you have a few built-in options, 
including the str() function, the ctime() method, and the isoformat () method (we 
don't document the latter two). 


Tip The most flexible way to display a datetime () object, however, is to use the 
strftime() method ("strftime" stands for "string-format time"). This method takes a 
string as an argument and uses % codes inside the string to specify how to format 
each element of the datetime () object. (The strftime codes work like the string 
interpolation codes described in Chapter 6, but the meanings of the codes themselves 
are different.) The datetime () formatting codes vary slightly by operating system. 


Here's one way to format the moonwalk object: 


1. Type moonwalk.strftime 


2. On the same line, in parentheses, type the following string: 
3. ("SA, %B %d, %Y %1:%M:%S") 


This string tells Python to display the time using a long day of the week (2a), long 
month (%8B), date of month (2d), four-digit year(sy), and time. It uses commas and 
colons as punctuation. 


Here's the line of code and the result: 


>>> moonwalk.strftime("%A, %B % 
VSincley, diblhy 2O, ISG) lO ss6s00 
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Adding time 


When you perform arithmetic on datetime () objects, you get the result in a 
timedelta() object. Its attributes are days, seconds, and microseconds. 


TECHNICAL Here is the timedelta() object you get when you subtract the date 

STUFF of the moon walk from today's date (in this example, "today" is May 
21, 2006, approximately 5:25 p.m.). The example shows that 13,453 
days elapsed between the first moon walk on July 20, 1969, and May 
21, 2006: 


>>> now = datetime.datetime.today() 

>>> moonwalk = datetime.datetime (1969, 7, 20, 22, 56) 
>>> now - moonwalk 

datetime.timedelta (13453, 66596, 407968) 

>>> print now - moonwalk 

IS453 Claws, ssa 's6, “07 M66 


The timedelta() object works with the following operations: addition, subtraction, 
multiplication, floor division, and absolute value. The timedelta() object is immutable, 
so it works as a dictionary key. 


REMEMBER When you create a timedelta() object, you don't have to use days, 
seconds, and microseconds. (That would be tedious for large time 


differences!) 


Follow these steps to specify a timedelta: 


— 


Type datetime.timedelta. 
2. On the same line, in parentheses, specify any of the following keyword 


arguments, separated by commas: 
3. weeks, days, hours, minutes, seconds, microseconds, milliseconds 


All the arguments are optional and if left out default to 0. For example, to specify 
one day, five minutes, and three seconds, type this: 


datetime.timedelta (days=1, minutes=5, seconds=3) 


This is the timedelta object created and Python's result. The result doesn't look exactly 


like what you entered because Python internally converts your data to days, seconds, and 
microseconds. 


SSS CAEScimMe . timecellica (1, 3,0, 0, 5) 
datetime.timedelta(1, 303) 


When printed as a string, the preceding timedelta() object looks like this: 


>>> str(datetime.timedelta(1, 303) ) 
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i cey, Os0s3O03 
Taking your time 


The time() module includes tools for working with the computer's internal 
representation of time. Tools depend on your operating system. 


The time () module doesn't handle times or dates in either 


e The far past (before 1900 A.D.) 
e The far future 


TECHNICAL The future time and date of the cutoff depends on your 
STUFF operating system; for UNIX systems, it's 2038 A.D. 


To work with far past or far future dates, use the datetime () module instead. 
Using the time() object 


A time() object is stored as a sequence of nine integers: year, month, day of month, hour, 
minute, second, weekday, day of year, and whether daylight savings time is on (1 ) or off 


(0). 


The time() object is a tuple in Python 2.1 and earlier. In later versions, it is another data 
type called struct _time(). (We're telling you this because you might see an error 
message referencing struct _time() objects.) Time is also expressed as a floating point 
number, for example, by the time.time() method. 


To create a time () object that stores the current time and date, type this (you'll get a 
different result, of course): 


>>> time.localtime() 
(2006, 2. ly 18, 9, 4, 2 82, ©) 


To see the local time as a string, type this: 


>>> time.asctime () 
"Wed Feb 1 17:35:29 2006' 


To format a time object as a string, use the strftime() method. See "Formatting date 
and time information," earlier in this chapter. 
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Let's pause for a moment 


The time module's sleep () method is useful for programs that need built-in pauses. 
(Quiz shows, anyone?) The sleep () method pauses the execution of a program for a 
specified number of seconds. 


The following example uses the sleep () method: 


import time 

for x in range(3): 
print "The time is now", time.asctime() 
print "Now sleeping for 2 seconds..." 
time.sleep (2) 


This is the result of running the example code: 


The time is now Sat Feb 4 17:28:08 2006 
Now sleeping for 2 seconds... 
The time is now Sat Feb 4 17:28:10 2006 
Now sleeping for 2 seconds... 
The time is now Sat Feb 4 17:28:12 2006 
Now sleeping for 2 seconds... 


Keeping Track with the logging Module 


The logging module is new in Python 2.3. Use it to set up log files or displays for the 
errors and debugging information for your programs. Python has very flexible logging 
features; we get you started with a few basic ways to use logging. 


Getting started with basic logging 
The tool that processes messages 1s called a Jogger. The default logger is called "root." 


Log messages are tagged with priority codes, and each priority code has a method that 
produces a log message of that priority. The code methods are 


debug(), info(), warning(), error(), critical () 
To log a message, follow these steps: 


1. Type logging, a dot, and the method corresponding to the message priority, 


for example: 
2. logging.warning 


Tip By default, the root logger doesn't do anything with debug or info messages. 
If you want to see your message without reconfiguring the logger, use one of 
these methods that indicate a higher priority message: warning(), error(), 
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Or critical (). 


3. On the same line, in parentheses, type a message as a string. 
4. logging.warning("Watch out!") 


The following example tells the root logger to log a warning message and shows 
the result you get if you type this in interactive mode: 


>>> logging.warning("Watch out!") 
WARNING: root:Watch out! 


TECHNICAL The default behavior of the root logger is as follows: 
STUFF 
e Process warning messages and higher, ignore debug and info 
messages 


If you try to use the info() method with the default root 
logger, you get no response: 


>>> logging.info("Try and see me") 
Sas 


e Print messages as strings 
e Log messages to sys.stderr 


In interactive mode, sys.stderr prints to the screen. 
Changing the configuration of a logger 


To change the default configuration of the root logger, or to set up the configuration of a 
logger you write yourself, use the basicConfig() function. Figure 17-1 illustrates some 
example basicConfig() code. 


Log al messages of “detwo™ level of higher 


” i. PY ae 
logging.basicConfig(level=logging.DEBUG, 
__— format= %{asctime)s %(levelnames)s %{message}s’, 

een filename="/tmp/myapp.log’, 
sae filemode="W’) - Specify a file to write logging 
ai messages lo and prepare fhe 
Print the log message string using the file for writing 


; 
following foamat Glascthme) specifies 
’ 


a date and time format, 


Figure 17-1: Example of logging. basic Config() code. 
The keyword arguments for basicConfig() require Python 2.4 and later. 


Designing your own logging system 


If different parts of your program have different logging requirements, here's some 
guidance to get you started creating your own loggers. 
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Tip Name your loggers according to the part of the program you're logging errors from 
and then configure your logger in these basic ways: 


e Levels of messages to process 
The levels are associated with integers, as follows: 


CRITICAL: 50 

ERROR: 40 

WARNING: 30 

INFO: 20 

DEBUG: 10 

e Format of the messages 

e Handlers that determine where the messages should be logged. 


Oo 0 0 0 0 


Most handlers are of the type StreamHandler or FileHandler. There's also 
SMTPHandler (sends error messages via e-mail), HTTPHandler (sends 
messages to an HTTP server), and more. 


Seeing a few logging functions 


These are the most useful basic functions available in the logging module: 


log () 

Logs a message. Specify options in this order, separated by commas: priority level 
(as an integer), message (a formatting string with % codes), and the arguments to 
include in the string. Add exc_info at the end to include exception information in 


the message. This example logs a warning message by using % formatting: 


SS> logging, lee (30, Mioeillo, ais iclivene wee", Myzoui™)) 
WARNING: root:hello, is that you? 


disable () 


Disables messages at the specified priority level and below. Overrides the levels 
specified in loggers. Specify the priority level as an integer. 


shutdown () 


Shuts down the logging system. Flushes and closes all handlers. 


Chapter 18: Processing Text 


Overview 
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There are whole books on Python's text-processing capabilities, so this chapter covers 
only the basics—three of the most important text-processing modules that come with 
Python. In this chapter, you find out about 


e The re module, which brings the power of regular expression text searches to 
Python 

e The stringTfo library, which lets you work with strings as if they were files 

e The textwrap library, which includes various tools for tidying up paragraphs 


Tip More text-processing information can be found in Chapter 6. 


A Million Ways to re, You Know That There Are 


The re module supports searching text files by using regular expressions (regexes for 
short)—strings that specify a set of match rules. 


The following sections describe 


e The character codes that you use to build regexes 
e The search abilities of Python's re library 


Tip When not to use a regex 


Python's data types include a number of easy-to-use text-search methods. We 
recommend that you get familiar with these methods and use them rather than the re 
module if they will fit your needs. 


Regular expressions are complex and hard to use, and it's easy to make mistakes 
when using them. 


For example, if you have a file of quotations, each separated by '%%\n"', you could 


use a regex to split the file into separate quotes. But it would be better to use the 
split () method of strings. 


Writing a basic regex 
To create a regex, just write it as a string—that is, enclose some text in quotation marks. 


You can use single or double quotation marks, or even triple quotes. Your string can be 
just a set of characters to look for, or it can include special instructions. 
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The following string, if used with one of the re methods or functions, uses the [] 
characters to give special instructions to search for either monty or minty (but not 
mointy, a city in Kazakhstan): 


Uae [Oat | avy 

Tip Both regexes and Python strings often use the backslash (\) to indicate special 
treatment for a character. The re module recognizes Python's \ codes, like \n for 
newline. (We discuss these codes in Chapter 6.) But to avoid conflicts, you should 
usually write regexes as raw strings by prepending r to the string. The r tells 
Python's string data type to ignore any backslashes inside the string and save them 
for the re method or function to process. 


Setting up a basic regex search 


The following code gives the name zipcode to a regex (specified as a raw string) that 
looks for the text Zip: followed by any number of white space characters (\s*) and five 
digits (\d): 


>>> zipcode = r'Zip:\s*\d\d\d\d\d' 
TECHNICAL To use the zipcode regex to search for a zip code in a string, type 
STUFF code like this: 


>>> addr = "Street address: 342 Anywhere Rd \n Zip: 
SAAS) Sy) 


>>> re.search(zipcode, addr) 


e Ifthe search is successful, it returns an object of the type 
_sre.SRE Match, like this: 

e <_sre.SRE Match object at 0x8202640> 

e Ifthere is no match, it returns None. 

OQ SSS imbysicieiling; = ico)” 


e >>> print re.search(zipcode, mystring) 
e None 


Regular expression characters and codes 
The search rules in a regular expression can be simple (the simplest useful regex is a 
single character) or complex. Complex rules are implemented by using characters that 


have special meanings in regular expressions. We describe these characters and codes in 
the following sections. 


Special characters 


Table 18-1 shows the most commonly used of the characters that have special meanings 
in regexes. 
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Character 
. (dot) 


_ (caret) 


{ number } 


Table 18-1: Regular Expression Special Characters 
=™>Open table as spreadsheet 


Action Example and Notes 

Matches any single character except a 'p.thon' matches python, 

newline. p!thon, p2thon, but not 
pyython. 

Encloses a set of characters to match (a 'T[iy]ger' matches Tiger or 


character class). A match occurs if any one Tyger (but not Tiyger). 

of the characters is found. You canenclose 

specific characters or a range of characters, If you want to match a hyphen 

such as all the uppercase letters: [A-2] inside a character class, it must 
be the first or last character. 


[\s.-] matches space, period, 
or hyphen. 


At the beginning of a character class, , stands '[,a]' matches any character 
for "not." Outside of a character class, , that isn't ‘a’. 
matches the beginning of a line. 
'A' matches 'America' but 
not 'Canada’. 


Matches the end of a line or string, depending 't$' matches 'bright' but 


on the MULTIMODE flag. (See Table 18-3.) not 'tender'. 

Matches either of two regexes, starting with 'python' | '‘asp' first looks 

the leftmost and stopping when it finds a in a string for 'python'. If 

match. You can use multiple | characters to 'python' is not found, it then 

join several regexes. looks for 'asp'. 

Matches the preceding regex only if the regex 'y{3}' matches 'pyyython' 

occurs a certain number of times but not 'python' 

consecutively. 

Encloses a regex group. You can refer to r "Phones (\d{3})- (\d{3}- 

groups by number. The first group is referred \4{4}) ' looks for a phone 

to by \1, the second by \2, and so on. number, with the area code in 
group | and the number in 
group 2. 


\1 refers to the area code group 
in the above regex. 


Escape code—treats the character that '\s' searches for the $ 

follows as special, or treats a special character in text instead of 

character as text. See the following section. _ treating $ as an "end of line or 
string" code. 
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Code 


\ number 
\A 
\b 
\B 
\d 
\D 
\s 
\s 
\w 
\W 
\Z 


Long Name 


Table 18-2: Regex Codes with Special Meanings 


Action 


IGNORECASE I 


LOCALE 


L 


MULTILINE M 


DOTALL 


UNICODE U 
VERBOSE Xx 


Short 
Name 


=>Open table as spreadsheet 


Matches the contents of the group with that number 
Matches only at the start of a string 

Matches an empty string at the beginning or end of a word 
Matches an empty string in the middle of a word 
Matches any digit (0-9) 

Matches any character that is not a digit 

Matches any white space character 

Matches any character that is not white space 

Matches any alphanumeric character 

Matches any character that is not alphanumeric 
Matches only at the end of a string 

Table 18-3: Optional Flags for Regexes 


=>Open table as spreadsheet 
Meaning 


Matches are case-insensitive. 
Matches depend on the current locale. 


Allows , and $ to match linefeeds (\n) inside a string. Without 
it, they match only the beginning (,) and end (s) of a string. 


Dot (.) matches any character including newline. 
Matches depend on the Unicode locale. 


Allow spaces and comments in regexes. This makes them 
easier for humans to read. 


Escape codes with special meanings 


When you use \ in front of some characters in a regex, it creates a code with a special 
meaning. Table 18-2 lists these special regex codes. The re module also recognizes the 
Python escape codes—for example, \n for newline. 


Finding repeats 


One common regex task is to find repeats of a character or other match. You write a regex 
to find repeats by using the characters +, *, ?, or {}. 


Python lets you search for repeated characters in a number of ways: 
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e »* matches the previous character any number of times, including none. 
py*thon matches pthon, python, pyython, and so on. 

e +matches the previous character one or more times. 
py+thon matches python and pyython, but not pthon. 

e 2? matches the previous character zero or one times. 

py?thon matches pthon and python, but not pyython. 

Numbers enclosed in curly braces ({ }) create a range of repeated characters, like this: 
¢ py{5}thon matches an exact number of repeats: pyyyyython 
¢ py{1,3}thon matches a range of repeats: python, pyython, and pyyython. It 
doesn't match pthon or pyyyython. 

Tip If you leave out the first number of the range (for example, py{,3}thon), the 
first range defaults to 0. This example would match pthon, python, pyython, 
or pyyython. If you leave out the last number of the range (for example, 
py{1,}thon), any number of repeats equal to or larger than the first number 
will match. 

Regexes support two kinds of repeating searches: Greedy and non-greedy. 
Greedy searches 


A greedy search tries to match as many of the search characters as possible. 


The following regex greedily searches for one or more y characters in a row. It finds all 
three y characters in the string 'pyyython': 


SSS jo = Vownyvtiaom! 

Pe enssechy ie = ViApl 

Pe imiymlencClay = ice\ Siyelacielal (GiaSercly 1c, 9) 
>>> mymatch.group () 

"yyy' 


Non-greedy searches 
A non-greedy search stops after it finds one match. 
To specify a non-greedy search, add ? after either 


e The repeat character 
e The end brace of a range 
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The following regex searches for one or more y characters in a row, but the ? character 
specifies a non-greedy search. As soon as this search finds one y, it's finished: 


Boe 


1S) 


= 'pyyython' 


Pr mom eueeSechy ie = "Ware" 
MMVAMNEIECI = iE SOeSiecla (imu weassschy iS, jo) 
mymatch.group () 


Pa 
oe 


vay 


Regex groups 


Regexes can be complicated to read and to type. One way of making regexes easier to 
read is to enclose a portion of a regex inside parentheses to make a regex group. This also 
makes the regex easier to use later because you can refer to the group by typing a number 
or name instead of retyping the whole regex. 


Python deals with two types of regex groups: 


Numbered groups: Whenever you group part of a regex inside parentheses, it 
automatically gets a number—1 for the first group, 2 for the second, and so on. 
Named groups: To name a group, begin the statement inside parentheses with 
this code (substituting your name): 

2?P<groupname> 


Follow it with the regex and then type a closing parenthesis. 


The following example creates a group named zip that searches for the text Zip: 
followed by any number of spaces and exactly five digits. 


(2?P<zip>Zip:\s*\d\d\d\d\d) 


You can refer to a numbered or named regex group in several ways: 


You can refer to a group in any MatchObject method that takes a regex as an 
argument. In the examples below, the groupdict () method returns a dictionary 
containing the group name 'zip' and its matches. The group () method returns 
the matches of the second group: 

SSS iy weCes = il (PPE<zalorAaljos \si \ONcl\GNGING)) \s* (Sigenees \s wa Vue) © 
SS> acllies = “Aijos LOO10 Sirateee INA! 

SES VY = iS, SOeuecln (my wacerx, excleies))) 

SS> y.cqroupcieic (! wale! )) 

{"@ajo"s Yeas LOCO!) 

>> 7 .cneowya (2) 

VSicaices IN" 

When writing a regex, you can refer to the 'zip' group by using the following 
syntax: (?P='zip') 

You can use a group when specifying replacement text: \g<zip>. 


219 


Tip For an example of a regex with multiple groups, see "Making regexes more 
readable," later in this chapter. 


Using the regular expression module 
There are two ways to use the re module: 
e Use functions to perform regex operations on a string. 
e With the compile () function, turn the regex into an object (called a regular 


expression object; in some Python messages, it's called an SRE_ Pattern object) 
and use the object's methods to work with the regex. 


Functional versus methodical 

Tip If you want to use a simple regex only once, it's okay to use the functions. But for 
code that uses regexes heavily, we recommend using the compiled form of regexes 
for two reasons: 


e¢ The compiled object works faster. 


e Remembering the order of arguments is easier if you always use the compiled 
form. 


Using a regex function 
To use a regex function, type these elements: 


e The function name 
e Parentheses 


Pass any arguments inside the parentheses, separated by commas. The regex is 
always the first argument. 


This example uses the findal1() function to return a list of all the vowels in a string. 


>>> mystring = 'Your father smelt of elderberries!' 
>>> print re.findall('[aeiou]', mystring) 
Mor vs Vue, ae: Sey Lens Vora, Ta as Koes Vela Te'] 


Some regex functions return a match object. See "Querying the match object," later in 
this chapter. 


Using a regex method 
To use a regex method, type these elements: 


e The name of the regex object that you created by using the compile () function 
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See the next section, "Compiling regular expressions." 


e Adot 
e The method name 
e Parentheses 


Pass any arguments inside the parentheses, separated by commas. 


This example creates a regex object that finds vowels and uses the findall() method to 
return a list of all the vowels in a string. 


>>> mystring = 'Your father smelt of elderberries!' 

>>> vowels = re.compile('[aeiou]') 

>>> vowels. findall (mystring) 

ious, Vine, aele, Seu ben 5 Von Tas Ua es Kea. isk Te'] 


Compiling regular expressions 


To turn a regex string into a regular expression object, you use the compile () function. 
compile () takes a regex string and optional flags (see Table 18-3) as arguments. Here's 
an example: 


SSS iy Wecox = iS, Comsalea(Vjoysrelaem!, 12S: 5 INCINOINE(C/AS Ia) 
>>> my regex 
K SHS, SRE Patiermn Cloyect cic Oxs2200E80> 


The object contains the following data attributes: 


¢ flags: A number corresponding to the flag used when the re object was 
compiled. See Table 18-3. This is 0 if no flags were used. 

QQ SSP iy wSCexx. LAGS 

° 2 

* groupindex: A dictionary mapping any group names to group numbers. The 
dictionary is empty if no group names were specified. 

¢ pattern: The regex from which the object was compiled. 

OO See ily 1eSCiSr<. OSCE Sei 


e =6'pyt+thon' 
Tools of the re module 
The following regex functions and methods are the most useful. 


findall() 


findall() returns a list of all the occurrences of a regex in a string. It also takes an 
optional flag argument (Table 18-3 lists the flags). 
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Tip finditer() does the same thing as findal1() but returns an iterator object rather 
than a list. 


matchQ) 
To find a single occurrence of a regex at the start of a string, use match (): 
¢ match() takes a string and an optional flag as arguments (Table 18-3 lists the 
flags). 
¢ match() returns either 
o AMatchObject instance (if there is a match) 
See "Querying the match object," later in this chapter. 
o None (if there is no match) 

REMEMBER It's easy to confuse match (), which matches only at the beginning of a 
string, and search (), which matches anywhere in a string. (See the 
following section.) One way around this problem is to always use 
search (), specifying , in your regex to match only at the beginning of a 
string. 

search(Q) 

To find a single occurrence of a regex anywhere in a string, use search (). 

¢ search () takes a string and an optional flag as arguments (Table 18-3 lists the 
flags). 
¢ search() returns either 
o AMatchObject instance (if there is a match) 
See "Querying the match object," later in this chapter. 
o None (if there is no match) 
split?) 


To split a string wherever a regex is found, use split (). 


e split () takes these arguments: 


o 6Astring 
o An optional integer specifying the maximum number of times to split the 
string 


e¢ split() returns a list of the substrings. 


This example shows the workings of the sp1it() function: 
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>>> cowl = "I never saw a purple cow, @@ I never hope to s one;" 
SSS ie. sjolliie(!!, Couil) # function 
['I never saw a purple cow, ', ' I never hope to s one; '] 


This example uses the split () method with the same regex: 


>>> cow2 = "But I can tell you anyhow, @@ I'd rather see than be one." 
>>> sep = re.compile('') 

>>> sep.split (cow2) 

['But I can tell you anyhow, ', " I'd rather see than be one."] 

subQ 


To find all occurrences of some text in a string and substitute other text, use sub (): 


e When sub() finds a match, it substitutes the replacement string. 
* sub() returns either 

o The changed string (if any changes were made) 

o The original string (if no changes were made) 


sub () takes these arguments: 
e Areplacement string 
e The original string 


e An optional integer 


The integer specifies the maximum number of replacements. If you leave the 
integer out, all matches are replaced. 


The following code shows the result of replacing part of a string: 


>>> mystring = "European swallow" 

>>> myregex = re.compile ("European") 

>>> myregex.sub("African", mystring) method 
"African swallow' 

>>> re.sub('European', 'African', mystring) function 
"African swallow' 


Querying the match object 


A successful regex search returns a Python object called a Matchobject, as illustrated in 
the following example: 


PLS ity MSgesk = mS, Compile ("joysrclaom " ) 

>>> mystring = "pyyython is a suuuper programming language" 
>>> mymatch = my regex.match (mystring) 

>>> mymatch 

K Sie. SNH Meech Gojece ait OxsiZeoo2s> 
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The MatchObject stores information about where in the string the match starts and ends, 
the substring(s) that were matched, and so on. To get access to this information, you use 
the following methods: 


¢ group (): Takes the group index number as an argument; this defaults to 0, or the 
first group, which returns the matched string. 

e >>> mymatch.group (0) 

e 'pyyython' 

¢ groups (): Returns a tuple containing the match found for each regex group. 
Takes an optional argument to return for the regex groups that didn't match 
anything; this defaults to None. 
Tip The regex for the following Matchobject has two groups, one named group 

for zip code and one numbered group for the string 'USA'. 

e >>> y.groups () 

° (Yaajog YAOI, VUBAY) 

¢ groupdict (): Returns a dictionary containing all the named subgroups of the 
match. The keys are the subgroup names. Takes an optional argument to return for 
nonparticipating groups; this defaults to None. 

e >>> y.groupdict () 

oO 6 Ymaljiot sg Ymsliog Qaliloi | 

¢ span (): Returns a tuple containing the starting and ending index numbers that 
were matched. Takes the group index number as an argument; this defaults to 0, or 
the first group. 

e >>> mymatch.span (0) 

O 860, 8) 


You can get just the starting position with start () and just the ending position 
with end(). Both of these also take the group index number argument. 


Tip Additional regex resources 


The "Regular Expression HOWTO" Web page at 
http://www.amk.ca/python/howto/regex/ includes a basic tutorial on regular 
expressions. 


Regex debuggers can help clarify regex problems. Kodos, the Python Regular 
Expression Debugger (available at http://www.kodos.sourceforge.net/), is one such 


tool. 


A tool for practicing and testing regexes comes with Python 2.4. It's part of the GUI 
tool kit Tkinter, and it's usually installed at Tools/scripts/redemo.py 


Kiki, which comes with the WxPython installer, has nice coloring for grouping in 
regexes. More info about WxPython can be found at http://www.wxpython.org/. 
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When you're writing a program that uses a regex to search a string, you usually want to 
check whether there was a match. To do this, you check whether the match object is 
None. For example, you might use this code (remember that if mymatch is short for if 
mymatch is not None): 


if mymatch: 

print 'Match found' 
else: 

print 'No match!’ 


Making regexes more readable 


You can use the VERBOSE flag when you write a regex to make it more readable. With this 
flag, you can use comments and white space in your regex. Here's how it works: 


e The regex interpreter ignores any white space that is outside square brackets. 

e Anything following the 4 character on a line—as long as # isn't inside square 
brackets—is considered a comment rather than part of the regex. 

e Using a raw string with triple-quotes is the simplest way to write a multi-line 
regex with comments. 


This example searches text for a phone number. It finds numbers with or without 
parentheses around the area code, and with or without a space, period, or hyphen as a 
separator. 


SS> plone C6 = ES.cCempule (ie! '" 
(2 optionally match begin parenthesis 
(?P<area>\d{3}) match/name group for area code 
We # optionally match end parenthesis 
INS.=] 2 optionally match space, period, or hyphen 
(?P<prefix>\d{3}) # match/name group for prefix 
[\s.=] 2 # optionally match space, period, or hyphen 
(?P<suffix>\d{4}) match/name group for suffix 
ewer UU 4 Tee), WHRUEYO'S IE) 
PES OMCMES TS. SUS (1e" \EKeeay— \oKoreiixs—\eKsuiiisS!, | (Zils) sS5-1212") 
Zi 3=555-1212 
SSP Pome “YS. sud (e" \Gxanea>—\o<qowetiseS—\qxsuiiisS>”, "213,555.1212") 
ZL3=55 9 1212 


Strings Disguised as Files 


The cStringIo and String1Io modules support "string buffers" (also called memory 
files) that let you pretend that a string is a file. This is useful because some Python library 
modules, such as the tarfile module, work with files but not with strings. Python holds 
the "file" in memory instead of writing the file to disk. 
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StringIO versus cStringlO 


The cString1Io module implements the same functionality as StringIO. cStringI0 is 
faster, but it has a couple of limitations: 


¢ cStringIo doesn't support subclassing. Use StringIo if you need to write a 
subclass of StringI0. 


e Ifyou give Unicode strings to cStringI0, you must encode them as UTF-8 (see 
Chapter 6). 


Creating a StringlO object 
There are two ways to create a StringIO object: 
e Pass it a string as an argument. This is called initializing the buffer. 
Do this if you want to read or otherwise process an existing string. 
e Leave out the argument, which creates the object with an empty buffer. 
Do this if you want to write to the object. 
To create a StringIo object, follow these steps: 


1. Type from cStringIO import StringIO. 
2. On the next line, type code like one of the following lines. 


Substitute the name of your file and your string (or its name). 
To create an empty buffer, type this: 
pretend file = StringI0() 
To initialize the buffer with an existing string, type this: 
pretend file = StringIO("mystring") 

StringlO special methods 


The methods for working with stringTIo files are the same as the methods for working 
with regular files (described in Chapter 17). 


Here are two differences: 
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e The StringtIo object includes the getvalue() method, which returns the entire 
contents of the memory file as a string: 

O SSS jprecenel tile. ervalume () 

Ce ViaiyyAsiicireabiavee Y 

e The StringIO close() method frees up the memory buffer. 


A regular file object's close () method writes the contents to disk. 
Paragraph Dumplings: Filling and Wrapping Text 


The textwrap module, new in Python 2.3, is for formatting paragraphs of text. You can 
perform three basic formatting tasks with it: unindenting ("dedenting"), wrapping, and 
filling. You can also create a customized TextWrapper object. 


Removing indentation from strings 


To remove white space that appears in front of every line of the string, use the dedent () 
function. It takes a string as an argument. 


TECHNICAL Dedent is meant to be the opposite of 
STUFF indent. 


One use for this function is to make triple-quoted strings in a Python program— 
docstrings for example—line up with the left margin, even though they are indented in 
the source code. Here's an example: 


SSS mhysicicaiae = V\ Tyger, Tyger, burning bright 

Seeks In the forests of the night''' 

>>> print mystring 
Tyger, Tyger, burning bright 
In the forests of the night 

>>> print textwrap.dedent (mystring) 

Tyger, Tyger, burning bright 

In the forests of the night 


Wrapping text by splitting it up 


To turn a long string into a list of strings, one for each line of text, use the wrap () 
function. It takes a string as an argument, and it also takes an optional width argument, 
specifying the maximum length of any line (it defaults to 70 characters). The strings 
returned don't have newline characters. 


The following example shows the use of the wrap () function on a long string: 


SSS mMAEssihioz =] VR 
. Worse, many abandon the pearl which exists in the present in favor 
. of a caravan of dreams which always recedes across the desert. -- 
Mixed 
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Metaphor Hall of Fame"™"™" 
>>> m = textwrap.wrap(metaphor, width=40) 
>>> m 
['Worse, many abandon the pearl which', ‘exists in the present in favor 
OE 
a','caravan of dreams which always recedes', ‘across the desert. -- 
Mixed 
Metaphor', ‘Hall of Fame'] 


Tip You could use a for loop to print the result of the wrap () function: 


SSS iO i sim ms 

joxealiane aL 
Worse, many abandon the pearl which 
exists in the present in favor of a 
caravan of dreams which always recedes 


across the desert. -- Mixed Metaphor 
Hall of Fame 


Here's a shorter idiom for accomplishing the same thing: 
pone atine Y Vint 5 3) @ atin (um) 
Wrapping text by adding newline characters 


To add newline characters to a long string, use the £i11() function. It takes a string as an 
argument and returns a string. It also takes an optional width argument, specifying the 
maximum length of any line (it defaults to 70 characters). 


Here's an example of wrapping text with the £111 () function: 


SSS lLojaieisicriline; = 9)” 
But once you build some islands of peace into your daily routine, 


they 
help serve as beachheads against the full-court press of life. 
--Mixed Metaphor Hall of Fame''' 


>>> wrappedstring = textwrap.fill(longstring, width=40) 
>>> wrappedstring 


But once you build some islands of\npeace into your daily routine, 


they help\nserve as beachheads against the full-\ncourt press of 


life. --Mixed Metaphor\nHall of Fame' 
>>> print wrappedstring 
But once you build some islands of 

peace into your daily routine, they help 
serve as beachheads against the full- 
court press of life. --Mixed Metaphor 
Hall of Fame 
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Creating a TextWrapper object 


The wrap () and fi11() functions work by creating a TextWrapper instance and calling 
a single method on it. 


TECHNICAL This instance goes away after you use it, so for applications that 
STUFF format many text strings, it's more efficient to create your own 
TextWrapper object. 


TextWrapper attributes 


When you create a TextWrapper object, you can specify any of the following arguments. 
You can also use them as keyword arguments for the wrap () and £111 () functions. 


¢ width: The maximum length of a wrapped line. The default is 70 characters. 

* expand_tabs: Changes tabs to an equivalent number of spaces (between | and 8). 
Default True. If set to False, tabs are treated as single characters. 

¢ replace whitespace: Changes all white space characters to spaces. Default 
True. 


What counts as a white space character is defined by the whitespace data 
attribute of the string module. This depends on your locale and operating 
system. Here's our result: 


SSS MK Sieiesliavs) 
>>> string.whitespace 
UME Vin \xOlo\xe \ie " 


The meanings of the codes are: \t (tab), \n (newline), \x0b (vertical tab), \x0c 
(form feed), \r (carriage return), ' ' (space). 


Warning If you set expand_tabs to False and replace whitespace to True, 
tabs are converted to single space characters. 


¢ initial indent: Set this to a string of five spaces or so to get an indented line at 
the beginning of a paragraph. The default is an empty string. 
¢ subsequent indent: This string is added at the beginning of all wrapped lines 
after the first. Default is an empty string. 
e fix sentence endings: Puts two spaces between sentences. 
Warning What textwrap thinks a sentence is might not match what you think a 
sentence is. For example, fix sentence endings puts two spaces after 
"Dr." Default is False. 
¢ break long words: Breaks words longer than the width attribute. If set to 
False, lines with long words may exceed the specified width. Default is True. 
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Example of a custom TextWrapper object 


Here's an example of a custom TextWrapper object. This object specifies a 40-character 
line and indents each line after the first one 4 spaces. 


>>> mywrapper = textwrap.TextWrapper (width=40, subsequent _indent=" A) 
>>> indented string = mywrapper.fill(longstring) 
SSS joralime slinelemiceyel SiciwalinG) 
But once you build some islands of 
peace into your daily routine, they 
help serve as beachheads against the 
full-court press of life. --Mixed 
Metaphor Hall of Fame 
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Chapter 19: Digging into Disk Data 


Python includes a variety of modules and packages for working with data files on disk. In 
this chapter, you find out how to 


e Use Python to work with files on your hard drive, including compressed Zip files. 

e Use Python to communicate with databases. 

e Turn Python objects (such as dictionaries) into formats that can be stored or sent 
over the Internet. 


Shell Game: Copying and Moving Files 


The shutil module lets you use Python to manipulate, copy, and remove files on your 
hard drive ("shutil" stands for "shell utility"). 


Tip You can use the os module to work with hard drive files, but shutil is usually 

simpler because it does a lot of the path manipulation behind the scenes. 

Warning If you're working on a Mac or your program will run on a Mac, use the 
macostools module, not shutil, to copy files and directories. On Mac OS, the 
shutil module does not recognize resource forks and other metadata. If you 
use shutil to copy files on a Mac, the files' resources, file types, and creator 
codes may not be correct. 


Some of the useful shutil functions are described in the following list. All the functions 
take pathname arguments (as strings)—source pathname first, followed by destination 
pathname (if any): 


¢ copy (): Copies the source file and its permissions to the destination. 


TECHNICAL The copy () function has these rules: 
STUFF 
o The destination can be a file or a directory. 


o Ifthe destination is a directory, the copy has the same 
name as the source file. 


This example copies the file 'myspider.py' to 'copyy.py' within the same 
directory: 


>>> import shutil 
SSS Slaw , cejow (Vunysiowcleie joy? , “Corsyayojony")} 


¢ copy2(): Like copy (), but also copies the last access time and last modification 
time. 
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copymode (): Copies the permissions of the source file to the destination file. 
Does not change the file contents, owner, or group. 

copytree (): Copies the entire contents of the source directory to the destination. 
The destination cannot already exist. Uses copy2() to copy files. This example 
copies the contents of 'MyDir' to 'MyNewDir': 

>>> shutil.copytree('MyDir', 'MyNewDir') 


This function also takes an optional third argument, symlinks, specifying what to 
do with symbolic links: 


o True creates symbolic links in the new directory tree. 
o False (the default) copies the contents of symlinked files. 


TECHNICALStarting in version 2.3, Python raises an oSError if any problems occur 
STUFF while implementing copytree (). (In earlier versions, Python printed a 
message.) The oSError looks something like this: 


>>> shutil.copytree('MyDir', 'MyNewDir') 
Traceback (most recent call last): 

imabllkes “ereclsins Walia I, alia 2 

File 
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/ 
fOyAclaoiN? ,3/ Slowest joy", iaine 102, sil Clroyosrcieee 
OSError: [Errno 17] File exists: 'MyNewDir' 


move (): New in Python 2.3. Moves the source to the destination. 


TECHNICAL The move () function has these rules: 
STUFF 


The destination cannot already exist. 
If the source and the destination are on different file 
systems, move () first does a copy to the destination 
and then deletes the source. 
o The paths can be files or directories. 
This example moves a directory: 
shutil.move ('MyNewDir', '../MyNewDir') 
rmtree (): Deletes the specified directory and its contents. 


Error messages depend on the optional argument ignore errors: 


o When set to True, ignore errors Suppresses any error messages. 
o The default is False, which causes Python to raise an exception if it 
discovers an error while executing rmt ree (). 
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An advanced feature also lets you customize the exception handling. 


This example removes the directory that we moved in the previous example: 


shutil.rmtree('../MyNewDir') 


Zipping and Unzipping 

The zipfile and gzip modules support data compression and expansion. 

zipfile 

The zipfile module provides access to Zip files, which are created with WinZip (the 

most common Windows file compression utility), the Mac OS "Create Archive" 

command, or the UNIX zip utility. 

TECHNICAL In Windows XP, Mac OS, and UNIX, Zip files are similar to folders 

STUFF (because they can hold files and folders). If you select a folder in 
Windows XP and make it a compressed folder, it's compressed as a 


Zip file. 


The zipfile module lets you use Python to create, read, write, append, and list a 
compressed Zip file. 


Warning The zipfile module doesn't work with either 
e Zip files with appended comments 
e Zip files that span multiple disks 
Finding out whether a file is a Zip file 
To identify a file as a Zip file, import zipfile and use the is zipfile() function: 


e Its argument is a pathname string. 
e Itreturns True if the file is a valid Zip file. 


This example checks a file called 'stuff.zip': 


>>> import zipfile 

SS> girlie as galoitille (VMyADiie/ stewie. wala! ) 

REDS) 

Warning Don't make incorrect assumptions if is_zipfile() returns False. It also 
returns False if the file doesn't exist. 
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Creating a ZipFile object for reading or writing 


To read or write to a compressed Zip file, you create a ZipFile() instance. 


To create a ZipFile object, follow these steps: 


1. 
2. 
3 


Type a name for the object and = zipfile.ZipFile(. 

Type the path to your Zip file (as a string). 

If you want to write to the Zip file, type a comma and 'w'. 

Warning Using 'w' erases whatever is currently in the file, so make doubly sure 
that you don't need to keep what is already in the Zip file before you go 
appending 'w'. 

There is another mode, 'a' for append. This is an advanced feature we don't 

expound on in this book. 

Tip If you don't want to compress the stored data, skip to Step 5. 

If you want to compress the data, specify a compression type by typing a 

comma and compression= ZIP DEFLATED 


ZIP STORED (the default) stores data without compression. 


TECHNICAL The z1ib module must be available if you want to compress 
STUFF the data. 


End by typing ). 
If you fill in both optional parameters, your instance might look like this: 


SSS mh = ajo, “ajo ike (MiyDilie/sx.miljo", iw", 
compression=zipfile.ZIP_DEFLATED) 


>>> myz 
<zipfile.ZipFile instance at 0x780a8> 


Working with a ZipFile object 


ZipFile instances are similar to file objects. They have these methods: 


close() and read(): Work like the file () object methods of the same names 
(see Chapter 17). 

getinfo (): Takes a string specifying the name of an item in the archive and 
returns a ZipInfo() object containing information about the item. 

>>> myz.getinfo('pagecount.py') 

<zipfile.ZipiInfo instance at 0x79fa8> 


One way to see the attributes of the zipInfo() object is to use the dir () 


function. The following example looks at the contents of a ZipInfo() object 
called myinfo and then looks at its file size: 
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>>> dir (myinfo) 

["ORO", Tiilekisecec!, " coe  "; 7 imi ", %  mocmile ", 
"comment', 
"CCMSHESS Sime”, SComemesS _tiyoe'!, "Cweake System” , 
"CLSGES VERSiem" , 

"Gleiee time’, "Geirewmell aire’, “Sxewe, VERitwaci wersiem , 
MedlS OFS Y 
“tile sige", “tillleneme”, Vitlag Isis", Vneecleir oriseit" » 
Vaimeewimell aieicie” , 
‘Orig tilememe”’, Tweseievec”, “wolumne” | 
SPY MVALAIEO. tile Size 


4220L 

infolist (): Returns a list of ZipInfo() objects for all items in the order they're 
stored in the archive. 

namelist (): Returns a list of strings—the names of the items in the archive—like 
SO: 


>>> myz.namelist () 

| ‘eojoay.ow", Uaedllleworilel joy! , Vunysjoucler joy! , Noagecouin sjoy!, ‘sx! 
printdir(): Lists the items in the archive to sys. stdout. 

>>> myz.printdir() 


File Name Modified 
Size 

copyy.py AVOS—OS—O7 losses 54 
4220 

helloworld.py Z0O05=-O7-O5 iss @ile2¢ 
22 

myspider.py AVOS=-Ll=O6 ZilgA2AQsO2 
4220 

pagecount.py ZOO5=1O-10 Ise @il silo 
367 

x 2006-02-24 13:49:46 
0 


testzip(): Examines the checksums (CRCs) of all the items in the archive. 
Returns as a string the name of the first bad file detected; if all files are good, 
returns None. 

write (): Takes as an argument a string specifying a pathname and writes the 
specified file to the archive. 


The archive must be open for writing or appending. write () takes two optional 
arguments: 


o arcname, a string specifying the name for the file in the archive (it defaults 
to the filename) 
o Acompression type, either ZIP STORED or ZIP DEFLATED 


This example writes the file 'helloworld.py' to the archive and stores it with 
the name 'foo!: 


myz.write('MyDir/helloworld.py', arcname='foo') 
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¢ writestr(): Writes a data string to the archive. The archive must be opened with 
mode 'w' or 'a'. Takes two arguments: 

o Astring specifying the name that the written data string will have in the 
archive (or a ZipInfo() instance with at least a filename, a date, and a 
time) 

o Astring containing the data to store 

O  iiyzajout WHeinesicie (Giait@, “May Cleve’) 


The ZipFile() instance has a data attribute, debug, which specifies the level of debug 


output to print to sys.stdout. The options range from 0 (no output) to 3 (the most 
output). The default is 0. 


Zipping up a Python library 

PyZipFile() instances create zip archives containing Python libraries. Python can 
import modules from Zip files. Putting the modules your program uses into Zip files is 
one way of delivering a smaller program. 

To create a zip archive with Python files, follow these steps: 


1. Create a PyZipFile() instance. 


Use the instructions in "Creating a ZipFile object for reading or writing," earlier 
in this chapter. Open the archive for writing or appending. 


Your code will look something like this: 


vior4 = Malorne, Dyan dle (Vico) ~aljo!), “wy )) 


2. On the next line, type the name of your instance, a dot, and writepy (. 
3. On the same line, type a file or directory pathname and ). 


If you type a filename, the suffix must be .py. This example uses a directory 
pathname: 


mypz.writepy('MyDir') 
How writepy() works depends on the pathname you type: 


e Ifyou type a filename, the file is added to the archive. 

e Ifyou type a directory name, all the Python files in the directory with the suffixes 
-py, -pyo, Or .pyc are added to the archive. The file locations depend on the 
path's location: 
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o Ifthe path is a regular directory, the files are added to an archive at the top 
level of the directory. 

o Ifthe path is a package directory, the files are added under the package 
name and subdirectory names (if any). 


REMEMBER pyZipFile() objects use the same methods as ZipFile() objects. 
gzip 


The gzip module reads and writes compressed files that are compatible with the GNU 
program gzip. To your program's users, the contents of these files look like ordinary files 
(that is, not zipped). 


TECHNICAL The gzip module doesn't support all the formats that the GNU gzip 
STUFF utility supports. For example, compress and pack aren't supported. 


To use Python's gzip module, use its open () function to create a GzipFile() instance. 
The instance works like a file object, but with these differences: 


e The GzipFile() instance doesn't support the readinto() or truncate () 
methods. 

e The close() method does not close the file object the instance is using. It closes 
the instance only. 


This feature allows you to do something to the file object after working with it as 
a GzipFile() instance. For example, if the file object is a StringIo object, you 
can use the StringI0.getvalue() function to retrieve the contents after closing 
the GzipFile() instance. 


To create a GzipFile() instance from a file on disk, follow these steps: 


— 


Type gzip. open ( 
2. On the same line, type the path to your gzipped file as a string. For example, 
3. gzip.open('filename.gz' 


Tip If you don't want to write to the file, skip to Step 4. 
4. If you want to write to the file, type a comma and 'w'. 
Warning Using the 'w' mode erases whatever is currently in the file, so make 
sure that you don't need to keep what is already in the file before you go 
appending 'w'. 
5. There is another mode, 'a' for append. This is an advanced feature we don't 
document in this book. 


Tip If you want the file to open and close faster, you can tell gzip to create a 
faster, larger file: 


a. Type a comma. 
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b. Type compresslevel-. 


Cc. 


Type an integer from 1 (fastest, largest file) to 9 (slowest, smallest 


file). 


By default, the compresslevel value is 9 (slowest, smallest file). 


6. End the line of code by typing ). 


If you fill in both optional parameters, your code and the result might look like 


this: 


>>> import gzip 
SSS) uy aL dL 


>>> myfil 
<gzip open file 


TECHNICAL 
STUFF 


S 
S 


= gzip.open('quote2.new.gz', mode='w', compresslevel=8) 


"quote2.new.gz', mode 'wb' at 0x4b5c0O 0x5d490> 


It's also possible to create a GzipFile() instance from a file or file- 
like object (such as a StringIo object) already opened in Python. 
This works much like using the open () function, but the order of the 
arguments is different: 


iF 


2 


Mode (optional, defaults to the mode of the Python file 
object) 

Compression level (optional, defaults to 9 or 
slowest/smallest) 

The name of the Python file object (required) 


The creation of a GzipFile() instance might look something 
like this: 


>>> gfile = gzip.GzipFile(mode='w', 
compresslevel=8, fileobj=myfile) 
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Sussing Out SQL Databases 


The sqlite3 module gives you an easy way to store your data by using the SQLite 
library, an implementation of the universal database language SQL. (SQL is pronounced 
"sequel" and stands for Structured Query Language. SQLite is pronounced "sequel-ite"). 
sqlite3 is a new standard module in Python 2.5. If you have Python version 2.2 through 
2.4, you can download the essentially similar pysqlite module from 
http://www.pysqlite.org/. 


We don't have the space for detailed information on using SQL—SQL is a whole 
programming language all by itself. But in the sections that follow, we show you the 
basics of using sqlite3 as well as some quick tricks for using SQL. (SQL For Dummies, 
5th Edition, by Allen Taylor has more information.) The Web site http://www.sqlite.org/ 
also has lots of documentation. 


Installing SQLite and sqlite3 


The sqlite3 module is automatically installed when you install Python 2.5. On 
Windows, the Python 2.5 installation also includes SQLite itself. On other platforms, 
make sure that SQLite is installed before you build Python 2.5. See the README file in 
your Python download for details. 


Setting up a SQLite database 


To create or use a SQLite database, you open a connection to the database (it's like 
opening a file) and create a cursor to access the database. 


You can open a connection in two ways: 


¢ Disk-based connection: Allows you to write changes to the database file on disk. 

e In-memory database: The connection works faster, but you can't write changes 
to disk. It's most useful for loading some data into memory and then running a 
bunch of queries. 


To open a connection to a database on disk, follow these steps: 
1. Type import sqlite3. 


2. On the next line, type the code to open the connection: 
o Fora disk-based connection, type the name of the database in the 


parentheses: 

conn = sqlite3.connect ('my database') 

For an in-memory database, type the special string ':memory: ' in the 
parentheses. 
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You can type a pathname to an existing database or type a new filename to 
create a new database. 


conn = sqlite3.connect(':memory:') 


3. On the next line, create a cursor by typing code like this: 
4A, CUesSor = COM CUKSOx (()) 


The cursor is like a bookmark telling you where you are in the database. 
Working with a SQLite database 


This section describes some basic tasks you can do with the database: creating a table, 
adding data to a table, and getting data out of a table. 


Creating a table 


A table is an area in a database where you store related information. A table consists of 
records (sets of information), which in turn contain fields (each field is one bit of 
information of a particular type). To create a table, you write a query (instructions to the 
database), and then use the cursor's execute () method to run the query. To create a table 
to store addresses, you might use code like this: 


Clery = VN -- sqlite queries are strings; comments use 
Veen) 
create table address -- create a table named "address" 
( -- begin creating a record 
name varchar, -- create a field called "name", with type 
"varchar" 


street varchar, 

Gale swelieclaeie 
) -- end of this record 
aaa # end of query 


The above code is formatted so that it's easier for humans to read. SQL doesn't care about 
indentation in query strings. The text after -- on each line represents comments. Python 
ignores these. 


REMEMBER The code varchar in the preceding example is a type (a declaration of the 
kind of information to be stored in that field). varchar specifies that the 
field contains text of varying length. SQLite requires you to define a type 
for each field. (SQLite doesn't care much about types, but most other SQL 
databases do.) 

Tip Think of SQL tables as being like Python classes except that they only contain data, 
not methods. A SQL database can have multiple tables. A record or row (the section 
within parentheses in the above example) is like a class instance. The fields or 
columns of a record (each line inside the parentheses is one field) are like instance 
data attributes, except that all records have exactly the same attributes (the data 
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associated with the fields can be different). In contrast, class instances can have 
different attributes, although most of the time they don't. 
TECHNICAL Actually, most modern SQL databases do allow for creating methods, 
STUFF but they're an advanced and complicated feature. So pretend we 
didn't mention them. 


Populating a table with data 
After you have a table, you can populate the table (add data to it), like this: 
1. Create a list of tuples. 


Each element in a tuple should correspond to a field in one of the database's 
records. For example: 


addresses = [ 
(Per Sines icc’, VIZ Manin Sie’, ViNi@ir lela’) , 
(Sauiailew Ree Sines Ibe”, "S2Zi waist Sic’, VBselicea!), 
] 


Tip For help arranging data into the proper format, see Chapter 8. 


2. Use the execute () method in a for loop to write the addresses into the 


database. 
3. for addr in addresses: 
4. cursor.execute("insert into address values (?, ?, ?)", addr) 


The execute () method's first argument is a string with the SQL command 
insert into, the name of the table, and values followed by a set of query 
parameters in parentheses. Question marks (?) stand for query parameters; 
sqlite3 passes in an argument for each parameter. The number of question marks 
must match the number of elements in the second argument. Lucky us—in the 
example above, there are three question marks and three elements in each addr 
tuple. The second argument is the target of the for loop. 


Tip Use double quotation marks or triple quotes for all Python strings that contain SQL 
queries. SQL requires string values to be surrounded by single quotation marks (see 
the example in the "Getting a subset of the values" section), so you'll make life easier 
for yourself if you avoid using single quotation marks for Python strings when 
working with SQL. 


Getting field values from a table 
To get field values from a table, you execute a query specifying the data you want and 


then use the fetchal1() method to retrieve the data. (You may have noticed by now that 
most SQLite tasks involve executing a query.) 
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Getting all the values 


This example retrieves a list of all the values stored in our table's name field: 


>>> cursor.execute ("select name from address") 
<sqlite3.Cursor object at 0x66830> 

>>> print cursor.fetchall() 

h(t Pees Shops Edy) i (ul Similar Per Shops ltal,))| 


The fetchall() method returns a list of tuples. Because each tuple in the result has only 
one element, the tuples end with a comma. See Chapter 8 for a review of tuples. 


REMEMBER SQLite stores text data as Unicode strings. See Chapter 6 for more info 
about Unicode. 


Getting a subset of the values 


The real power of a database lies in its ability to give you the subset of information that 
matches your needs. In this example, we get the records that have "Bolton" in the city 
field. If you know that the « character stands for "all fields," it's pretty easy to understand 
what this query does: 


>>> cursor.execute ("select * from address where city='Bolton'") 
<sqlite3.Cursor object at 0x66830> 

>>> print cursor.fetchall() 

[ (a Sanmilae Pet Sinejos Wire’, wiilZs Maidin Sic! wi seliem’) | 


TECHNICAL When you're starting out with sqlite3, you should probably stick 

STUFF with storing and retrieving simple strings, numbers, and dates. But 
when you get more comfortable, you can store Python objects in the 
database by using the pickle module's dumps () function. See the 
next section, "Pickling Your Data (And Relishing the Outcome)." 
Note that you won't be able to share your database with non-Python 
programs if you store pickled data. 

TECHNICAL sqlite3 follows the Python Database API Specification v2.0 (for 

STUFF more details, see Python Enhancement Proposal PEP249 at 
http://www.python.org/dev/peps/pep-0249/). Because Python 
database modules are all designed to work consistently, when you 
need a more powerful database (such as PostgreSQL or Firebird), it 
should be easy to upgrade your code: You need only change the 
connect () function call. 
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ickling Your Data (And Relishing the Outcome) 


The pickle and cPickle modules let you store data between runs of a program, the way 
a word processor allows you to save formatted documents and edit them later even if you 
quit the program. This is called pickling because it's a way of preserving your data. 


TECHNICAL Storing a Python object in a file makes it a persistent object. You 

STUFF could write your own code to store and read Python objects, but if 
your data includes complex objects, it's easier to use pickling because 
it automatically converts data back and forth between strings and 
Python objects. 


Python includes the following support for persistent objects: 


e The cPickle and pickle modules convert Python objects to and from string 


representations. 
TECHNICAL Pickling is commonly used when working with persistent 
STUFF databases, which store Python objects so that they can be 


accessed by keys. 
e The shelve module (see the next section) supports persistent dictionaries whose 
values can be arbitrary Python objects. It works with the pickle and anydbm 
modules. 


Warning Pickled data is a security risk because it might include malicious code. Make 
sure you unpickle data only from a trusted source. 


The module of choice for the majority of pickling tasks is cPickle, which is simpler and 
faster than the pickle module. The only reason to use pickle is if you need access to its 
advanced features (which we don't cover in this book). 


Pickling an object 
To pickle a Python object, follow these steps: 


Type import cPickle. 

Open a file for writing. For example, 

mypicklefile = open(myfile, 'w') 

Type cPickle.dump and in parentheses specify the name of the object you 
want to pickle and the name of the file you opened (which is where the 
pickled object will be stored). 


RON > 


5. cPickle.dump(myobj, mypicklefile) 
6. Close the file. 
7. mypicklefile.close() 


Pickling restrictions 
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Most Python data types can be pickled, with the following exceptions: 


e Functions, built-in functions, and classes defined at a level other than the top level 
of a module can't be pickled. 

e Aclasswitha dict attributeora setstate_() method that isn't 
compatible with pickling can't be pickled. 

e Tuples, lists, sets, and dictionaries that contain non-picklable objects can't be 
pickled. 

e Pickling stores only the names of functions and classes, not any code or attributes 
from them. So when you unpickle a module, its functions must be available for 
importing on the system you're unpickling from. 

e When instances are pickled, class attributes aren't pickled along with them. 


The following functions pickle a Python object. The function you use depends on what 
type of data you want returned. 


dumps() 

The dumps () function pickles a Python object and returns a string. It's useful for storing 
Python objects in databases, such as SQLite (see the previous section, "Sussing Out SQL 
Databases.") 

This function takes a Python object as an argument. 


dump(Q) 


The dump () function pickles the object and writes the pickled object to an open file (or a 
file-like object like stringIo). 


This function takes two arguments: 


e A Python object 
e A file object opened for writing 


Tip The dump () and dumps () functions take one optional parameter that specifies the 
pickling version to use. The following subsections show the right parameter for your 
version of Python. 


Python 2.3 or later 


If you're using Python 2.3 or later, the pickling version parameter for the dump () and 
dumps () functions is called protocol. 


Tip You can specify the pickling protocol version by entering one of the values from the 
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sidebar, "All about pickling protocols." 
All about pickling protocols 


There are three pickling protocols: 
¢ 0isan ASCII protocol. It's compatible with earlier versions of Python. 
0 is the default value. 
e 14isa binary protocol (also compatible with earlier versions of Python). 


e 2works more efficiently with Python's new-style classes (discussed in Chapter 
14). 


You can also specify this version by entering HIGHEST PROTOCOL or any negative 
integer. 


Protocol version 2 is new in Python 2.3. 


Older Python versions 


If you're using a version of Python older than 2.3, the optional pickling version parameter 
is called bin. It specifies to pickle the data in a binary format. (The default version is text, 
which is less efficient.) 


Warning Don't use bin in your code unless your program must be compatible with earlier 
versions of Python. 


Unpickling an object 
The loading function for a pickled Python object depends on whether it's a file or a string. 
Unpickling files 


The load() function takes a file object (or a file-like object such as StringIO) as an 
argument. The file should be opened in read mode. 1oad() reads the contents of the file 
object and unpickles it. The result is the original object that you pickled. 


TECHNICAL To unpickle a Python object from a file, follow these steps: 
STUFF 

Import cPickle. 

Open for reading the file containing the pickled object. 
mypicklefile = open(myfile, 'rb') 

Type a name for the object, = cPickle.1load, and in 


PAY r 
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parentheses the name of the file you opened. 
5. myobj = cPickle.load(mypicklefile) 
Unpickling strings 


The loads () function takes the name of a string as an argument. It reads the contents of 
the string and unpickles it. The result is the original object that you pickled. 


Using shelve with DBM-style databases 


This section is for people who need to store Python data on disk using Python 2.4 or 
earlier, or who need to use the Berkeley DB database library. (Berkeley DB and its 
relatives are called DBM-style databases.) If you have Python 2.5, read "Sussing Out 
SQL Databases," earlier in this chapter, instead. It's easier to use. (You can also download 
pysqlite for Python 2.2 through 2.4.) 


Storing pickles on a shelf 


The shelve module is a "wrapper" around several DBM-style database modules 
available in Python. The shelve module works like this: 


1. You create a shelve object and tell it to open a database file. 

2. The shelve object looks at your database file on disk and decides which Python 
database module to use. 

3. The Python database module accesses your database file. 


The shelve module and its tools have the following features: 
e The open() function automatically chooses an available database module when 
you open a file. 
e Shelve objects behave like dictionaries (see Chapter 9), with a few minor 
differences (see "Properties of a shelve object" later in this chapter). 
e Shelve objects allow you to store pickled Python objects as values. (See "Pickling 


Your Data (And Relishing the Outcome)" earlier in this chapter.) 


A shelve object is called a persistent dictionary because it can store and reconstitute 
Python objects from a disk file. 


Creating a shelve object 
To create a shelve () object, follow these steps: 


1. Type import shelve. 
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On the next line, type a name for the shelf, =, and shelve. 


open ('mydatabase'. 
Substitute the name of your DBM-style database file. 


To open the file as read-only, to open an existing file as read-write, or to 
create a new, empty database, type a comma and a flag: 


The flags are: 


'' read only 

'w' read and write 

'c' create database if it doesn't exist (default) 
'n' create new empty database 


Warning If the file exists, typing 'n' erases its contents. 
myshelf = shelve.open('mydatabase', 'w' 


If you have Python 2.3 or later and you want to use one of the advanced 
pickling protocols, type a comma, protocol= and one of the following: 


For more about the protocols, see "Pickling Your Data (And Relishing the 
Outcome)" earlier in this chapter. 


o 1 binary protocol 
o 2 or HIGHEST PROTOCOL advanced protocol; works more efficiently 
with Python's new-style classes 


myshelf = shelve.open('mydatabase', 'w', protocol=2 


To allow changes to mutable objects in the database, type a comma and 
writeback=True. 


. myshelf = shelve.open('mydatabase', '‘w', protocol=2, 


writeback=True 

Warning This feature makes the database slow to close. 

Tip If you don't turn on the writeback feature, you can still make changes to 
mutable objects by using a three-step process described in the sidebar, 
"Changing mutable objects in a shelve database." 


. Type the end parenthesis. 
. myshelf = shelve.open('mydatabase', 'w', protocol=2, 
writeback=True) 


Changing mutable objects in a shelve database 


When you create a shelve object, by default changes to mutable objects are not stored in 
the database file, because allowing this makes the database slow. To allow such changes, 
you can set writeback=True when opening a shelve object (see "Creating a shelve 
object"). But if you leave it turned off, you can use this three-step process for making 
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changes to mutable objects. These steps assume you've already opened a shelve database 


object. 


1. 


OMB W PD 


Access the mutable value by using its key, and give it a temporary name. In this 
example, the mutable value is a list and its key is 'mylist': 

temp = myshelf['mylist'] 

Make changes to the copy. In this example, we append a string to the list. 
temp.append('foo') 


Assign the changed copy back to the same database key. 
myshelf['mylist'] = temp 


Properties of a shelve object 


The objects created by the shelve module are like a cross between a dictionary and a file 
object. As with a dictionary, you can retrieve, store, search for, and delete keys and 
values. As with a file object, you can open and close a shelve object and direct it to write 
changes to a file on disk. shelve objects have these unique features: 


Keys must be strings. 

Values can be strings or picklable Python objects. 

Changes to mutable objects are not written to disk unless you set 
writeback=True when you create the shelve object. 


See the sidebar, "Changing mutable objects in a shelve database," for some 
workaround ways of changing mutable objects. 


The sync() method writes changes to disk. (Writing happens automatically when 
you use the close() method.) 


TECHNICAL When you write changes to disk by using the sync () or 

STUFF close() method, an extension might be added to the 
filename and more than one file might be created. 

Whether these happen depends on the underlying database shelve is using. 
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Chapter 20: Accessing the Internet 


Python has an array of tools for working with information over the Internet. This chapter 
introduces tools for handling Web page addresses, downloading and parsing data from 
Web pages, building XML data structures, creating and sending e-mail messages, and 
writing and installing CGI scripts (for processing Web forms). 


TECHNICAL This chapter introduces concepts for working with Internet modules, 
STUFF but we assume that you have basic knowledge of Web, XML, and e- 
mail protocols. 


Downloading Web Data 


The following sections cover opening URLs, reading their contents, and submitting form 
data to a Web server. 


Python has two modules for performing these tasks: ur11ib and urllib2. Each has 
specific strengths. We focus on the simpler ur11ib module in this section, but we 
recommend the ur11ib2 module for opening and fetching data from Web pages because 
it has better error handling. 


Opening a URL 


The function urlopen() creates objects that behave a lot like files, but it has two 
important differences from the open () function, which creates files: 


¢ urlopen() takes a Web page address (Uniform Resource Locator, or URL) as an 
argument (rather than as a filename). 
¢ urlopen() opens the URL only for reading, not for writing. 


Tip Both the ur11ib and the ur11ib2 modules support the urlopen() function. The 
function works the same way in both, but the urlopen() function in ur11ib2 has 
better error reporting on problems that occur when opening Web pages. 

Using urllib2.urlopen() to open a URL 

To open a URL, follow these steps: 

1. Type import urllib2. 
2. On the next line, type a name for the object and = urllib.urlopen (. 


3. Type the URL (as a string) and close the parentheses by typing ). 


Your code should look something like this: 
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page = 
urllib2.urlopen('http://www.python.org/doc/current/modindex.htm1') 


If all goes well, the Web page opens. If you examine the object you just created, you'll 
see something like this: 


>>> page 
<addinfourl at 137848780 whose fp = <socket. fileobject object at 
Ox837clb4>> 


Reading text from a URL 


To read text from an opened URL, use a for loop. This example prints the text to 
<sys.stdout> one line at a time: 


for line in page: 
sys.stdout.write (line) 


Understanding Web page errors 


If there's a problem opening the page, Python raises an exception specifying the kind of 
error. Sometimes the message is long, so we won't reproduce the whole thing here. The 
critical part of the message is the last line. Here's part of the message you get if the site 
exists but the page doesn't exist: 


>>> page=urllib2.urlopen('http://www.python.org/idontexist.html1') 
Pears 
Welw? js eeeoies RM mWiceowe A0As Now wewinel 


Detailed error reporting is useful when you want to write code that handles different Web 
errors in different ways. For example, you might write code that checks a Web page's 
links and logs whether the link is still valid, whether it redirects to another page, or 
whether the linked page no longer exists. (The spider. py program in Chapter 4 uses this 
functionality.) 


Finding information about a URL 


Python has two useful tools for getting the scoop on a URL. 
infoO 


To get meta-information (server type, last-modified date, content type, and so on) about a 
URL you've opened, use the info() method, like this: 


>>> print page.info() 

Daices Suln, Zil Wen ZOWG WO7 e525 15 GMM 

Server: Apache/2.0.54 (Debian GNU/Linux) DAV/2 SVN/1.1.4 

mod _python/3.1.3 
Pythons 2,355 moc ssl/2.0.54 OosmsSSib/ 0.9. 1 


310 


Last-Modified: Mon, 17 Apr 2006 12:04:05 GMT 
Ime Keg VAAN IL LO =[ere 5 — sitesi illo) 74! (0)"4 

Accept-Ranges: bytes 

Content-Length: 35525 

Connection: close 
Content-Type: text/html 


geturl() 


To get the actual URL that was opened when you made your request, use the getur1 () 
method, like this: 


>>> redirect = 
urllib2.urlopen('http://www.livejournal.com/users/firecat') 

>>> redirect.geturl () 

Uigieicjos / /euies@ane . liweyouuen-al .conn/ 

Tip Use getur1() to find out whether the URL you opened redirected you somewhere 


else. 


Processing special characters in a URL 


To process special characters (such as a space or tilde) in a URL, use the ur11lib 
module's quote plus () function. 


Warning Although the functions quote () and unquote() are similar to quote _plus(), 
we recommend using quote plus () to avoid problems with URLs that contain 
+ and space characters. 


The quote plus () function has a required argument and an optional argument. 


Required argument 


The quote plus () function requires a URL string (or part of a URL) as an argument and 
does the following: 


e Replaces special characters (anything but letters, digits, underscores, dots, and 
hyphens) by using the "sxx" escape format 

e Replaces spaces with + (required for quoting HTML form contents) 

e Escapes + characters in the original string 


This example processes a URL with slash characters and a tilde (~): 


SSS amo Wiel alo 
SSS uelliloseuecs jollus (ineios/ /eeie—eincl—Clraciom . coms ~SiuSit ” ) 
"https3AS2F%2Fcat-and-dragon.coms2F%S7Estef' 


Tip To reverse this process, use unquote plus (). (This function doesn't use any optional 
parameters.) 
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Optional argument 


The quote _plus() function takes an optional safe argument, which specifies characters 
to leave alone. 


The following example processes the same URL as the preceding example, but the slash 
character has been designated as safe: 


BSS wnclliloeoces jolts (inceas / /Cenceincl-Clicaicoin.com/~Sicsir!, selice=! /!) 
"http%3A//cat-and-dragon.com/%7Estef' 


Submitting form data 


When communicating via HTTP, the urlopen() function usually makes a GET request to 
the HTTP server. A GET request limits your URL (address and data) to 1,024 bytes. To 
send more data—for example, to submit the results of a form—you need to make a Post 
request. 


Sending data via a POST request requires the following steps. Note that these steps require 
you to import both ur11ib (for the urlencode() function) and ur11ib2 (for the 
urlopen() function): 


Type the following: 
dmjoowe willie, well Iale2 


BR WDNR 


Use the urllib.urlencode() function to encode the data. 
The function takes one of the following arguments: 


o Amapping object (such as a dictionary of form fields) 
o Asequence of two-element tuples 


The function converts its argument to the format application/x-www-form- 
urlencoded. Here's an example: 


SSS mvs = (il, Tome), (2, Mewol), ("SY , Vieluese) | 
>>> mydata = urllib.urlencode(mylist 
>>> mydata 
"l=one&2=twoé&3=three' 


— 


5. Call urllib2.urlopen() and pass the data from urlencode() as the second 


argument, like this: 
6. X = urllib2.urlopen('http://www.company.com', mydata) 
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Taming the Wild URL 


The urlparse module splits URLs into components, combines components into a URL, 
and converts a relative URL to an absolute URL. 


To split a URL string into components, pass it to the urlparse() function. There are two 
optional parameters for urlparse (): 


¢ default scheme: Used if the URL doesn't include an addressing scheme, for 
instance, 'http'. The default is an empty string (which means it's off). 

¢ allow fragments: Allows URLs with elements following a # character 
(designating a location somewhere inside a page). Defaults to 1, which means to 
allow them. 


The urlparse() function returns a six-item tuple of strings containing 


Addressing scheme (http or ftp) 

Network location (http://www.python.org) 

Path (/doc/2.4.2/lib/module-urlparse.html) 
Parameters (; type="a") 

. Query (?filter=16400) 

Fragment identifier (#top) 


DAAWRwWNS 


The tuple items don't include delimiters (colons and slashes), except for a leading slash in 
the path component. If any of the items isn't part of the URL being parsed, the tuple 
contains an empty string for that item. 


The following example contains an addressing scheme, network location, and path, but 
doesn't contain parameters, a query, or a fragment identifier: 


>>> urlparse.urlparse ("http://python.org/doc/2.4.2/lib/module- 
urlparse.html") 

(Moca, “ovviciioin.soine!, "/cloe/2.4.2/ilio/mocwlaqualoacseloicml!, 89, T0, 
bat) 


The urlparse module has a couple of other capabilities: 


e To turn a tuple created by urlparse() back into an URL string, pass the tuple to 
urlunparse(). 

¢ To stick together a base URL and a relative URL to create a complete URL, use 
urljoin(). It takes the base URL and relative URL as strings. It also takes the 
optional allow fragments argument. This example joins a single Web page to its 
location: 


e >>> urlparse.urljoin('http://python.org/doc/2.4.2/mac/', 
"mac.html') 
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e 'http://python.org/doc/2.4.2/mac/mac.htm1' 


Getting Hip with Hypertext 


To read and output HTML-formatted text files and to find links in HTML documents, use 
the html1lib module. It supports all of XHTML 1.0, all of HTML up to 2.0, and much of 
HTML 3.0 and 3.2. 


Of parsers, formatters, and writers 


Reading and outputting an HTML file is a three-step process. Here's what's going on 
inside Python while it's happening: 


1. A parser receives information from an HTML-formatted file. This file might be an 
HTML document you have on disk, or it might be a Web page you've opened with 
urllib2.urlopen() (see "Downloading Web Data," earlier in this chapter). 


When the parser encounters an opening tag, it stores whatever is inside that tag in 
a buffer until it encounters the matching closing tag. 


2. The parser passes the information to a formatter, which reformats it into a human- 
readable form by reading the HTML tags and applying instructions to them. 

3. The formatter passes the reformatted information to a writer, which outputs the 
information. 


Setting up a read-and-output process 

The parser, formatter, and writer are all instance objects, so you need to create the 
instances before they can act. The parser works on the contents of the file or Web page, 
so you have to open and read a file or Web page before the parser, formatter, and writer 


can act. 


To set up a read-and-output process, follow these steps: 


html = myfile.read() 
Create a writer instance. 


1. Import the htmllib, formatter, and StringIo modules, like so: 
2. import htmllib, formatter 

So iieOMl CSiciesine WO) sinsrouee Siew sljaves UO 

4. Open an HTML-formatted file: 

5. myfile = open("memo.html", 'rb') 

6. Read the file: 

Vo 

8. 


The simplest writer is DumbWriter (), which outputs plain text. 
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dumdum = formatter.DumbWriter (StringI0O() ) 
9. Create a formatter instance and pass it the writer instance. 


AbstractFormatter () 1s the most commonly used formatter class. 


fermat = formatter.AbstractFormatter (dumdum) 


10. Create an HTMLParser instance and pass it the formatter instance. 
1l.parsley = htmllib.HTMLParser (fermat) 

12. Feed the file contents to the parser. 

13.parsley.feed (html) 

14. Close the parser and the file. 

15.parsley.close() 

16.myfile.close() 


Outputting the links of a Web page 


After the parser has read the document, the document's links are available in the data 
attribute anchorlist. 


>>> parsley.anchorlist 
['http://www.portmeirion-village.com/'] 


Getting help for messy HTML 
If you have to deal with badly formatted HTML (and there's a lot of it out there), don't 
despair; a third-party tool can help. It's called Beautiful Soup (after the poem in Alice's 


Adventures in Wonderland), and full details are available at 
http://www.crummy.com/software/BeautifulSoup. 


The Great XML 


Python comes with modules that support Extensible Markup Language (XML), a 
mechanism for creating structured documents in plain text. 


TECHNICAL Although we wrote plain text above, XML is actually often encoded 


STUFF in UTF-8. See Chapter 6 for details about character encoding. 
TECHNICAL Both HTML and XML data include tags—text surrounded by 
STUFF brackets (<>). But they use tags differently: 


e In HTML, tags are specific formatting instructions. For 
example, <p> starts a new paragraph and <b> marks boldface 


type. 
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e In XML, tags label kinds of data. 


An XML tag can also have subtags for kinds of data that are 
part of a larger group—<address> data can include <name> 
and <street>; <item> data can include <price>, and so on. 
These tags clarify the data structure, so it's easy for a 
computer to separate the data into parts. 
Tip If XML tags and subtags remind you of Python's classes and their data attributes, you 
might be on to something.... 


The ElementTree XML implementation 
The best way to use XML in Python is with ElementTree. 
e ElementTree is included as xm1.etree in Python 2.5. 


e ElementTree is available for download from http://www.effbot.org.zone/element- 
index.htm if you're using an earlier version of Python. 


ElementTree is based on an Element data type, which stores hierarchical data structures 
(such as XML) in memory. 


TECHNICAL Ordering up some XML 

STUFF 
Here's an application for which XML is useful. Suppose you need to 
send an order. If you were writing it on a piece of paper, it would 
look something like this: 


Order: 

Eric Half-a-Bee 
123 Main St 
Anytown, CA, 95432 


Items: 
Foo 1 $12 $12 
Bar 2 S15 $30 


Subtotal: $42 
Shipping: $5 
Total: $47 


In XML, it would look something like this: 


<?xml version="1.0" encoding="ISO0O-8859-1"?> 
<order> 
<recipient> 
<name>Eric Half-a-Bee</name> 
<street>123 Main St</street> 
<city>Anytown</city> <state>CA</state> 
<zipcode>95432</zipcode> 
</recipient> 
<items> 
<item> 


316 


<prodcode>Foo</prodcode> 
<qty>1</qgty> <price>12</price> 
<total>12</total> 
</item> 
<item> 
<prodcode>Bar</prodcode> 
<qty>2</qty> <price>15</price> 
<total>30</total> 
</item> 
</items> 
<subtotal>42</subtotal> 
<shipping>5</shipping> 
<total>47</total> 
</order> 


Importing ElementTree 
How you import ElementTree depends on your version of Python. 


e In Python 2.5, type this: 


e from xml.etree import ElementTree as ET 


e Inearlier versions, after you download ElementTree, type the following: 
e import elementtree.ElementTr as ET 
Tip (You don't have to use the as ET part, but it will help if you are following 
along with our examples.) 


Creating an element 


An ElementTree structure consists of Element instances. 


e An£Element instance must have a tag—a string identifying the element type. 
e An£Element instance can have these options: attributes, a text string, and 
subelements (child elements). 


To create an element, type ET.E=lement and pass the tag string as an argument, like this: 


Eeecn= mivnlement@inee™) 

Element attributes 

Element attributes are dictionary key:value pairs. (The keys must be unique.) 
Assigning attributes 


To assign attributes to an existing element, use the attrib () method, like this: 


branch = ET.Element ("branch") 
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ieieallo) 
iBieaLlo) 


branch.at 
branch.at 


me 


(Moma ] = wou 
Menara” J = wou 


Manipulating attributes 


To manipulate attributes, use the following methods, which work a lot like dictionary 
methods (see Chapter 9): 


e To assign attributes when creating an element, use keyword arguments, like this: 


e 
Ho 


e 
Ho 


ranch. 


e 
Ho 


e 
(oy 


ranch. 


ranch = ET.Element ("branch", one="1", two="2") 

o return attribute values, use the get () method with the key: 
ranch.get ("one") 

o get a list of attribute keys, use the keys () method, like so: 


keys () 


o get a list of attributes (key:value tuples), use the items () method: 


items () 


e To set attribute values, use the set () method, like this: 


O loreeiaela 


Sere (Melnaliecl, 33") 


e To set an element's text attribute, assign it a string, like so: 
e branch.text = "this element also contains text" 


Subelements 


Element instances can have subelements, also called child elements. 


Creating subelements 


There are a couple of different ways to build a tree: 


Element () function: You can create subelements and attach them to the 


parent element by using the SubElement () function. The resulting code looks like 


e The sub! 
this: 
e Dat 5 Swulols 


lement (tree, "“branch") 


e The append() method: You can attach subelements to a parent element with the 


append ( 


) method, like this: 


e tree.append(ET.Element ("branch") ) 
Tip Subelements (child elements) are stored as a list, so you can access, add, and delete 
them by using list methods, like this: 


>>> tree.insert(0, ET.Element ("fruit flies") ) 


Searching for subelements 


To search for or 


e find(pa 


within subelements, use these methods: 


ttern) returns the first matching subelement. 
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¢ findtext (pattern) returns the value of the text attribute for the first matching 
subelement. 
¢ findall (pattern) returns a list of all subelements. 


Tip The pattern argument in find() can be either 
e A tag: If you use a tag, only subelements of that tag are checked. 


e A path: You can use a path to search the entire subtree. 

Tip To search the whole tree, use the getiterator() method. getiterator (tag) 
returns a list of all subelements with the tag; getiterator() (without an argument) 
returns a list of all subelements in the subtree. 

TECHNICAL getiterator() searches the tree in depth-first order; that is, it 

STUFF searches down one branch to the end, and then down the next branch, 

and so on. 


Using XML files 


ElementTree is designed to automatically read XML files, represent their structures, and 
write them to disk or to a Python file object. 


Loading documents 


To load an XML document and turn it into an Element Tree instance, use the file 
keyword argument to create a tree from a file in a single operation, like this: 


order = ET.ElementTree(file='order.xml') 
Writing an Element tree to disk as an XML file 


To save an ElementTree instance back to disk, use the write () method. It takes either a 
filename or a file object. The output might not be human-readable. 


order.write (output_file) 


To save an Element structure to disk, use the following code: 


ET.ElementTree (tree) .write(output_ file) 

Other useful XML modules 

Python includes a few other XML modules: 

e xml.dom: New in Python 2.0. Supports Document Object Model (DOM), a 


method for reading and modifying XML documents that works in a variety of 
programming languages. 
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Tip DOM creates XML documents in a tree structure and is useful for random- 
access reading. 

¢ xml.dom.minidom: Supports Document Object Model (DOM) but is simpler and 
smaller than the full xm1.dom module. 

¢ xml.sax: This package implements Simple API for XML (SAX). It's good for 
huge XML documents because it doesn't read the whole thing in one chunk. 
However, SAX is more difficult to use than the other modules, so we recommend 
you use it only when you really need it. 


MIME-ing Success: Managing E-Mail Messages 


The email package is a library that includes tools for reading the text of e-mail messages, 
transforming messages from text to Python objects and back again, and generating e-mail 
objects, including MIME documents. 


REMEMBER The email package doesn't send messages; the smtp1ib module sends 
them. 


Representing an e-mail message in Python 


In Python, e-mail objects are represented as instances of the Message class. AMessage 
instance has two parts: 


e Message headers: Addressing information, subject, date, and so on are stored in a 
dictionary-like format (a mapping), but there are a few differences from 
dictionaries: 

o Duplicate message headers are possible. 
o Message headers are stored in the order they appeared in the original 
message. 

e Payload: The body of the message can be either a string (a text-only message) or 
a list of Message objects (a multipart message; for instance, a message that has 
text, HTML, a GIF file, and so on). 


Creating e-mail and MIME objects 


To build a message structure, you create a Message instance and add attachments and 
headers. 


Creating a Message object 


The simplest way to create a Message object is to use the message from string() or 
message from file() function. 


To create a Message object from a string, use message from string(). Pass the 
function a string or StringIo instance, like this: 
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>>> import email 
>>> msg = email.message from_string(mystring) 


To create a Message object from a file, use message from file(). Pass the function an 
open file object. 


Creating MIME objects via subclasses 


The email module's MIME submodules are for attaching particular kinds of data, such as 
formatted text documents, to an e-mail message. The subclasses automatically set up 
some of the MIME headers for you. 


The name of the class is always the same as the name of its corresponding submodule. To 
import a class from a MIME submodule, type the following line of code (substitute the 
name of the class/submodule you want to import): 


7 7 


from email.MIMEBase import MIMEBase 


To create an instance of the class after using the preceding import statement, type code 
like this: 


= 


mime msg = MIMEBase (parameters) 


The following MIME subclasses create messages that include MIME objects. 
MIME Multipart() 


The MIMEMultipart () subclass is an outer layer for MIME messages that have multiple 
parts. 


To use Message objects (or other MIME subclass instances), you can either 


e Specify the objects as payload subparts when you create an instance of the 
MIMEMultipart subclass. 
e Add the objects later by using the Message.attach() method. 


TECHNICAL The MIMEMultiPart subclass is new in Python 2.2.2. 
STUFF 


MIMEBase() 


The MIMEBase() subclass is for creating MIME messages with message parts that aren't 
of specific other MIME types (text, audio, or image). 


MIMEBase () requires two Content-Type arguments: 
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e Major type (for example, application) 
e Minor type (for example, msword) 


TECHNICAL If you aren't sure what types to use, import the mimetypes module 
STUFF and use the guess _type() function, like this: 


SS WMUMSEVYSSS GUESS yee ("OO cloe ") 
('application/msword', None) 

>>> x = email.MIMEBase.MIMEBase('application', 
"msword' ) 


Tip To attach a Word document to an e-mail message, use the MimeBase() subclass: 


Import the modules you need, like so: 

. >>> from email.MIMEBase import MIMEBase 
Open the Word file for reading: 
SSP wOiiel sealle > iS (VO Coe, Vielo")) 


1 

2 

3 

4, 

5. Read the file, and then close it: 
6 

7 

8 

oF 

1 


2S) tile CoOmtenes = wore rile. weacl(() 
o S22 wore rile. close (() 


Make a MIMEBase () object by typing this: 


SSS hws wise) = jis ise, = MODMIB eS (Teyojollaleeicalom'',, Vansiteietel! ) 


0. Set the payload of the MIMEBase () object: 


11.>>> mime_msg.set_payload(file contents) 
Specifying a character set 

To specify the character set of a payload, use the set_ charset () method. 
TECHNICAL The set_charset () method argument can be either 
STUFF 


e AcCharset() instance 
e Astring with the name of a character set 


¢ None (which removes the charset parameter from the 
Content-Type: header) 


The following line specifies the character set using a string: 


Pe mS. Sei Cleese (" LSO=8e59= 1") 
Adding and changing message headers 
Message object headers are stored in a dictionary-like object: 


e The header field is the key. 
e The content of the header is the value. 
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To create or change a header, or to add a value, use the key, like this: 


>>> msg['Subject'] = 'Pictures of My Cat’ 


The add_header() method creates a new header, a value, and optional parameters for the 
value: 


Pe MSC) oaiClcl_ inservice (( WMiyCarc! , ‘Vanes! , COsie='Wue<éyclo " ) 


The set_param() method sets or changes a parameter in the Content-Type: header, 
creating the header if it doesn't exist. 


Adding content to an existing message 


To add a payload to a multipart e-mail message that already has an existing payload, use 
attach() and pass it a list of Message objects. 


If the message doesn't have any payload yet, use set_payload() to add a payload stored 
as a string. 


Warning If the message already has a payload, the set_payload() method replaces it. 
Generating MIME documents from message structures 


After you have set up an e-mail message as a structure of Python objects, you need to 
convert it to plain text if you want to send it or print it. This is called flattening the 
message. There are several ways to flatten a message object. We discuss two of them. 


Printing a message object as a string 


The easiest way to print the text of a message is to use the as_string() method of 
Message () objects, like so: 


SSS joie iicicllenisce 2s sisi (() 
Daces Wee, DS wJull 2ZO06G OLs@4s427 =<O 700 
From: Professor Grue 
To: Geography 101 students 
Subject: Islands of the world 


Farallons: 
Seal Rock 
Maintop Island 


Encoding binary data into string data 


The Encoders module includes functions for turning binary data into data that can be 
sent as e-mail. Two of its functions are 
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*¢ encode _quopri(): Use for a message that has mostly text but some unprintable 
characters. 

e encode baseé4(): Use for a message that has mostly binary (unprintable) data. 
The format is more compact than quoted-printable, but humans can't read it. 


The encoding functions add a content-Transfer-Encoding: header. They change a 
message in place and return None. To encode the Message () object littlemsg as base 
64, type this code: 


>>> import email.Encoders 

>>> email.Encoders.encode base64 (littlemsg) 
>>> print littlemsg 

From nobody Wed Jul 5 01:04:42 2006 

From: Professor Grue 

To: Geography 101 students 

Subject: Islands of the world 


im 


Content-Transfer-Encoding: base64 


RmFyYWxsb25zOgpTZWFsIFIJvY2sKTWFpbnRvcCBJc2xhbmOQKCgoK 


Reading e-mail messages 
This section shows how Python can read and manipulate existing e-mail messages. 
Getting information and payloads 


The simplest way to return an e-mail message as a text string is to use the as_string() 
method of a Message () instance. To include an envelope header in the string, specify 
unixfrom=True as the argument. 


mymessage.as_ string (unixfrom=True) 


To find whether the e-mail message has multiple parts, use the is multipart () method. 
It returns True or False. False means the message is a string. 


To see a message's payload, use the get_payload() method. If the message is multipart, 
it returns a list of Message () objects. If not, it returns a string. 


>>> msg.get_ payload () 

[<email.Message.Message instance at 0x82e26ec>, 
>email.Message.Message instance at 0x82e260c>] 
>>> littlemsg. get_payload() 

"Farallons:\nSeal Rock\nMaintop Island\n 


Parsing e-mail messages 
The classes of the Parser module process more complex messages. When you already 


have the whole message as a string or file, use its "classic" Parser class. Follow these 
steps: 
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Import the Parser module, like so: 
. >>> import email.Parser 


Create an instance of the Parser () class: 
. >>> p = email.Parser. Parser () 


OR WMH 


Use the parse() method for a file (pass it a file object) or the parsestr () 
method for a string (pass it a string). 


The text must be a block of headers followed by the body of the message (which 
can have MIME-encoded subparts). To read only the headers, pass the optional 
parameter headersonly=True. 


>>> X = p.parse (fp) 

Pe JOIILINIE 3x 

From nobody Wed Jul 5 01:04:42 2006 
From: Professor Grue 

To: Geography 101 students 

Subject: Islands of the world 


Farallons: 
Seal Rock 
Maintop Island 


Using e-mail utilities 
The email.Utils module includes the following message tools: 


e Functions for managing quotation marks and other delimiters 
o quote() takes a string and escapes backslashes and double quotes by 
preceding them with \. 
© unquote() removes double quotes and angle brackets from the beginning 
and end of a string. 
e Functions for reading address headers 
o parseaddr() splits the header into two parts—real name and e-mail 
address. It returns a tuple. 
o formataddr() takes the parseaddr () tuple and turns it back into a string 
suitable for an address header. 
o getaddresses() takes a list of headers and returns a parsed list of 
addresses found in the headers. 


This code gets all the addresses in a message's To: header: 


from email.Utils import getaddresses 
co Beckles = msg .qee_ all(’coe’, Il) 
EO _KECIOS = CSitacklnesses (ic@_aAciclies)) 


¢ Tools for working with dates 
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o parsedate() reads a date in a message header and returns a nine-item 
tuple that works with time.mktime() (see Chapter 17). If the date doesn't 
follow the standard format, parsedate() tries to guess how the date is 
formatted. 

o formatdate() takes a floating point time value (or uses the current time) 


and returns a date string of the following format: 
Oo weil, OF Nov 2001 OlsOssay —OOOO 


e Functions for decoding and encoding character sets 

o decode rfc2231() decodes a string according to RFC 2231. 

o encode rfc2231() encodes a string and accepts optional character set 
and language arguments. 

o collapse rfc2231 value() turns a get_param() three-item tuple 
(character set, language, and value) into a Unicode string. 

o decode params () decodes a list of two-item tuples in the format (content- 
type, string-value). 


Simply SMTP 


The email package only generates and processes e-mail messages. To send e-mail 
messages, you need the smtp1ib module. SMTP stands for Simple Mail Transfer 
Protocol. 


The primary way to use the smtp1ib module is to create an instance of the smTP class (its 
parameter is a hostname); then use the sendmail () and quit () methods in that order. 
The sendmail () method requires the following arguments: 


e From-address string 
e List of to-address strings, or a string containing a single to-address 
e Message (a text string—usually a flattened email .Message object) 


TECHNICAL The optional arguments for sendmail () are advanced, and 
STUFF we don't cover them in this book. 


The response from sendmail () depends on the following factors: 
e Ifthe server accepts the mail for at least one of the to-addresses, sendmail () 
reports success. 
e If any of the to-addresses didn't receive the message, sendmail () returns a 
dictionary containing one entry for each recipient who didn't get the mail, along 


with error codes. 


To end the SMTP session and close the connection with the server, use the quit () 
method. 


Tip The simplest code for sending a message is the following: 
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import smtplib 

server = smtplib.SMTP('localhost") 
server.sendmail(fromaddr, toaddrs, msg) 
server.quit() 


CGI: Gateway to the Web 


Common Gateway Interface, or CGI, is a standard scripting language for Web sites. CGI 
scripts work like this: 


1. A browser makes a request to a Web server. 
2. The Web server executes a CGI script. 
3. The CGI script output is returned to the browser as a Web page. 


You can write CGI scripts in Python. The cgi module includes tools that let your scripts 
work with Web pages and receive data from form fields. Debugging tools are also 
provided. 

Setting up CGI output in Python 

A CGI script usually outputs two sections of text separated by a blank line. 


Tip To generate this text from your Python program, use the print command: 


1. The first section contains headers describing the type of data in the second 


section. Code for this section might look like this: 
, pieime NComesmeSihyoess ieeoxic /micnall” 


2 
3. Use an empty print command to generate the blank line separator: 
4 
5 


» jeresliaic 
The second section usually includes HTML-formatted text that generates the 
Web page, like this: 

6. print "<TITLE>Message in a browser</TITLE>" 

To Pee "KES ESIIO, woeieledl</sis” 

8. print "<P> Help! I'm trapped in this Web browser" 
Q. josie VYeinel Ul Caine cece owe! «</1eS 


Reading data from Web input 
CGI scripts usually process input from a Web form created with the HTML tag <form>. 


Form fields can contain data of various types, but you usually want to treat the data as a 
string or a file. 
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Reading data from form fields 


To get form data, you create an instance of the FieldStorage() class (without 
arguments): 


e The instance reads form contents from standard input or from the appropriate 
environment. 

e The instance stores data in a dictionary-like structure in which a field name is 
associated with its value. 


Unlike Python dictionaries, instances of the FieldStorage() class allow for multiple 
keys with the same name. They support such methods as has_key(), keys(), and len(). 
After you've stored the data, the easiest way to get the values out of individual fields is to 
use the getfirst() and getlist() methods. 


Tip These methods protect against the possibility that a user entered too many values in a 
field. 


getfirst() 


If a field is supposed to have only one value, use the get first () method of the 
FieldStorage instance to return a single value. Pass the name of the form field as an 
argument, like this: 


addr = form.getfirst ("addr") 


The getfirst() method takes an optional parameter specifying what to return if the 
form field doesn't exist or has no value. It defaults to None. 


getlist() 


To return a list of all the values associated with a form field, use the getlist () method 
and pass the name of the form field. If there is only one value in the field, the method 
returns a one-item list. If there are no values or the form field doesn't exist, it returns an 
empty list. 


To process the list of values, use a for loop, like this: 


for 166i im LOM qerlise ("tielel Ineackeie”)) 3 
do_something (item) 


Warning Maintaining security when using HTML forms 


Forms on HTML pages are vulnerable to security problems—malicious code 
can be entered. If you are planning to use form data in a shell command on your 
computer (via the os.system() function, for example), include tests in your 
script to make sure the data contains only alphanumeric characters, dashes, 
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underscores, and periods. Specifically do not allow spaces. Whenever possible, 
use Python functions (for instance, shutil.move () ) instead of calling 
os.system() with data entered by end users. 


Similarly, when sending user-entered data back to the browser, make sure to 
strip HTML tags, especially if you have a Web site where users can see data 
entered by other users (such as a blog). 


Reading files from form fields 


To find whether a field stores a file, test the £i1e data attribute of the FieldStorage 
instance with an if statement, like this: 


fileitem = form["userfile"] 
iat seal LSaicem tees 
# do something 


Setting up and installing a CGI script 


Your script must follow certain guidelines to work in the Web environment. The 
following instructions show how to write and install your script. 


Writing a script to work in the Web environment 


Type the following code as the first line of the script (replace the pathname with the 
pathname of the Python installation on your Web server): 


#!/usr/local/bin/python 
Tip Use absolute paths in your script to access programs or files. 


To load modules from a location not on the default Python search path, use 
sys.path.insert () to add their pathnames to sys.path. Use 0 as the first argument to 
have this path searched first. For example, you might type 


import sys 
sys.path.insert(0, "/usr/home/stef/python_ modules") 


Checking for syntax errors 


Run your script from the command line to see whether it contains syntax errors. This step 
makes sure your script contains valid Python code: 


[oS 


& PWetlaGim SCrUpSe ow 
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Installing a script and setting permissions 


Your CGI script usually goes in the cgi-bin directory of your Web server. Check the 
documentation for your Web host provider to find out any requirements for the script— 
for example, some ISPs require that all CGI scripts end with . cgi. 


After you've installed the script in the appropriate directory, you need to set permissions 
so that users of your Web site can access it. Here's how: 


1. Change the permissions for the script to let the "others" group read and 
execute the script. 
Tip On UNIX, the command for giving the "others" group read and execute 
access iS chmod 0755 scriptname. 
2. Ifthe script uses any files, change those file permissions so that the "others" 
group can read or write the files (depending on how the script uses the files). 
o Reading the files: The UNIX command for making files readable by the 
"others" group 1s: chmod 0644 filename. 
o Writing the files: The UNIX command for allowing the "others" group to 
write to files is chmod 0666 filename. 


Debugging CGI scripts 
CGI scripts don't handle Python's error tracebacks the same way that Python itself does, 
so you need to follow special procedures for debugging your CGI scripts. The rest of this 
chapter shows you how. 
Check for syntax errors 
Before doing any other debugging steps, double-check your script for syntax errors (yes, 
even if you already did it once). Follow the instructions in "Checking for syntax errors," 
earlier in this chapter. 
Make sure your script is installed correctly 
Use these two steps to install and check the script: 

1. Install a copy of the cgi.py module in the location of your script. 

2. Run the cgi.py module by typing code like this into your browser (replacing 

hostname/cgi-bin/ with the path to your cgi-bin directory): 


http://www.hostname/cgi-bin/cgi.py?name=My+Name&addr=Cyber 


The results show whether the script is installed in the correct directory: 
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If the directory is correct, some diagnostic text, including the text after the ? in the 
above line of code, appears in the Web browser. 

If you see a 404 error or a message that the page can't be found, then the scripts 
are installed in the wrong directory. 


You might get a different error if the script is in the right directory but installed 
incorrectly. 


Test your code 


If your script doesn't work, and you've verified that it's installed in the correct directory, 
these steps test your code by using the cgi.test () function: 


1. 


a 
3 


In a text editor, comment out the main body of your code (put a # character at 


the beginning of every line) and add this line of code: 
cgi.test () 


Run the altered script by typing code like this into your browser (replace 
hostname/cgi-bin/ with the path to your cgi-bin directory and replace 
scriptname With the name of your script): 


http://www.hostname/cgi-bin/scriptname.py?name=My+Name&addr=Cyber 


If the diagnostic text and the text after the ? in the above line of code appears in 
the Web browser, then the script is installed correctly, and any problems you're 
having are probably with your script's code. 


Tip If the cgi.test() code doesn't work, then the script probably isn't installed 
correctly. Go back to "Installing a script and setting permissions," earlier in 
this chapter. You might need to ask your Web hosting company for the path of 
your cgi-bin directory. 


Include debugging code in your script 


If your script is executing some code but works incorrectly, set up your script to display 
helpful debugging messages. Use one or both of the following methods. 


Seeing CGI script errors 


The cgitb module displays error information for CGI scripts. Import and enable the 
cgitb module by including one of the following code lines near the top of your script, 
after the #! /usr/local/bin/python line: 


This line displays errors in the Web browser: 

import cgitb; cgitb.enable() 

This line saves errors to a temporary file: 

import cgitb; cgitb.enable(display=0, logdir="/tmp") 
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Seeing traceback information 


If your script isn't working correctly, it might be producing traceback information (the 
error messages that Python normally prints when you run a program from the command 
line or from interactive mode). Depending on the configuration of your Web host, the 
traceback messages sometimes appear in the HTTP server's log files. How you access 
these files depends on the server setup. Contact the support team for your Web host to 
find out more. 


If you don't want to dig through the log files, you can set up your script so that traceback 
information is displayed in your Web browser. Include this code near the top of your 
script: 


import sys 

sys.stderr = sys.stdout 

joxcatinie MCoinviceime—mwyoas teenie /jolleaiinl! 
pone aL ane 
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Part V: The Part of Tens 


Chapter List 


Chapter 21: Ten Critical Python Idioms 
Chapter 22: Ten Great Resources 


The 5" Wave By Rich Tennant 


"Well, here's your problem. You only have half the ram you need." 
In this part ... 


Lists, lists, and more lists! For Dummies readers know that they can count on the Part of 
Tens to provide yummy tidbits of information, and this Part of Tens is no different. 


In this part, you get ten chunks of code that solve annoying little programming problems 
in elegant ways; just drop them into your programs. We also show you ten free resources 
you can use to find out more about Python and hook up with the rest of the Python 
community. 
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Chapter 21: Ten Critical Python Idioms 


Here are ten little code snippets that illustrate how to handle some common problems in 
Python. They are arranged roughly in order of difficulty. 


TECHNICAL Some of these idioms are from the Python Cookbook, and you'll find 
STUFF more: 


http://www.pythoncookbook.activestate.com 


Collecting Globs and Globs of Files 


The glob module provides a single function, also called glob, which lets you get a list of 
files from a directory (you can also find directory names). The glob function recognizes 
the same wildcard search characters that the UNIX shell recognizes, except for tilde (~). 


TECHNICAL On Mac and Unix systems, glob searches are case-sensitive. 
STUFF 


Here's how to find all the Word documents (which end with the suffix .doc) ina 
directory called MyDocs: 


>>> import glob 

>>> docs = glob.glob('MyDocs/*.doc') 
['MyDocs/helloworld.doc', 'MyDocs/myspider.doc', 

"MyDocs/pagecount.doc'] 


The following example finds files beginning with . (a dot). These files aren't found in 
wildcard searches. 


SSS Clolo.gllela(! .*") 
[Y olaaickcleim iealeY | 


Rolling Dice and Shuffling Cards 


Python's random module is good for generating simple random numbers (and for 
sophisticated random number problems, but we won't get into that—we're just as eager to 
avoid statistics as everybody else). 
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Making a saving throw 


Here's a simple dice-roller function. To use it, just call it with an optional integer 
argument: 


import random 
def roll (d=6): 


return random.randint(l, d) 
SS ieellil () rolls a 6-sided die 
4 
SSS woll (20) rolls a 20-sided die 
LF 


Playing the dealer 


Now suppose you have a list representing a deck of cards. Here is some code that would 
create such a list: 


Clic mes Clee (()) 3 


suits = ['spades', 'hearts', 'diamonds', 'clubs"] 
values = [Yace", Ae Tg0.. AGE OG 0. 16%, ogo Teo. 0@o . CRO? » 
ae eae, 


"queen', 'king"] 
deck = [] 
FOr S aA Suess 
if@ie Ww alin welluess 
deck.append(v + ' of ' + s) 
return deck 


To get a single card, run the function to make a deck and pick a card with 
random. choice () like this: 


PSS COCK = imelse CCl () 
>>> random. choice (deck) 
"3 of diamonds' 


If you want to pick a few cards, use random. shuffle () and then use the pop () method 
of lists to retrieve cards one at a time. 


>>> newdeck = make deck() 
>>> random.shuffle (newdeck) 
>>> hand = [] 
SS> rO@le © alin welinee(S) 3 
hand. append (newdeck.pop () ) 
>>> hand 
['5 of diamonds', '6 of diamonds', ‘ace of spades', ‘ace of 
Glluliogs), Vikan; OE Ineereies | 
Tip Because shuffle() and pop() change the list in place, you should make a new deck 
each time you deal. 
Tip In Python 2.2 and earlier, shuffle () works well only with small lists. Python 2.3 
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uses a new random number algorithm so that shuffle () to work correctly with lists 
containing as many as 2,000 elements. But someone forgot to update the docs for 
Python 2.3 and 2.4; they don't reflect this change. 


336 


Uniquely Ordered Lists 


Python's set data type makes it easy to find all the unique items in a list. (See Chapter 9 to 
find out about sets.) But sets don't work well if you want to keep the order of items. This 
function takes a list and returns a new list containing only its unique items, in the same 
order as the original list: 


def unique (items): 

ime = |) 

cache = set() 

for item in items: 

if item not in cache: 

tmp. append (item) 
cache.add(item) 
WSU IeiMNS) 


If you're concerned about memory usage, you can use a generator version of the 
preceding function (see Chapter 16): 


def unique (items): 
cache = set() 
GI GLICEM aiay Weems s 
Wit Wieeih MOE sin Cacines 
cache.add (item) 
yield item 


Here's how they work: 


>>> menu = ['egg', 'bacon', 'egg', 'sausage', 'bacon', ‘'egg', 

"spam', ‘egg', 'bacon', 'spam', 'egg', 'bacon', 'sausage', 

Vga, Vision, Vioecom'’, Virco’,  sjoeun , 

"lobster thermidor with truffle pate, brandy, and a fried egg on top', 
"spam', ‘baked beans" ] 

>>> unique (menu) # preserves item order 

['egg', 'bacon', 'sausage', 'spam', 'tomato', ‘lobster thermidor with 
EIeULIE iE IS 

pate, brandy, and a fried egg on top', ‘baked beans'] 


Reversing Your Way to Success 


The unique function in the previous section keeps the first item it finds and throws away 
subsequent duplicates. What if you want the last matching item? Most of the time, adding 
the double-reverse technique does what you want: 


my data.reverse() 
my data = unique (my data) 
Hin CENCE. HEWSIeSS () 
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Note that the sort () method of lists and the sorted() built-in function, which puts list 
items in alphanumeric order, both have a reverse parameter (new in Python 2.4). See 
Chapter 8 for the lowdown on sort () and sorted (). 


Exceptional Type-Testing 


Sometimes it's a good idea to check the type of data that is being put into your function to 
make sure it's the type of data you want. You can use isinstance() Of issubclass() to 
do this. But that's not the ideal test for code that's intended to work with user-defined 
classes based on built-in data types. Python's duck-typing (see Chapter 13) makes it 
possible to create an object that has an append method, for example, that isn't an instance 
or subclass of the list data type. This object might work in your function but would not 
pass the isinstance() Or issubclass() test. 


If you want your function to process such objects, it's better to use a try/except block to 
verify that the object behaves the way you need it to. For example, you might test 
whether an object supports the append() and pop () methods and whether it allows 
indexing. (If so, there's a good chance it will support other list operations.) Here's a code 
block that performs such a test: 


def foo(l): 
ileslanyas 
l.append('foo') 
1[0] 
1.pop () 


except StandardError: 
raise Typekrror("Not a valid list object: tr" 3 1) 


TECHNICAL This code catches StandardError in the except statement. 

STUFF StandardError Is the class from which almost all exceptions derive 
(except for SystemExit and KeyboardInterrupt). Catching it also 
catches any exception that derives from it. We then translate the 
exception into a TypeError for the purpose of logging the result or 
printing a message. 

Tip Use this checking only with critical code. Usually, this level of checking is overkill, 

and you can assume that your users will follow the documentation you provided with 
your function (you did provide documentation, right?). 
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Classes Just for Data 


Python class instances are a lot like dicts—both can store keys and associated values. In 
fact, in many ways, instances are simply a thin wrapper around dicts, and attribute syntax 
is easier to read than dict syntax. 


IE@XO) 5 loyelie a class attribute 
OO || Noa! | a dictionary key 


Python programmers commonly create classes with no methods (a data-only class) for 
manipulating key/value data. Here's a class that makes it easy to initialize the class 
instance with some values (courtesy of Alex Martelli): 


class Bunch: 
Geir ime (selity “*lavaigis)) § 
Selir,  Glicic .wisclelieS (lawauecis)) 


When creating the instance, you just pass it one or more keyword arguments. In this 
example, we create an instance of Bunch and store the current working directory by using 
the keyword start dir: 


>>> import os 

SSS Comic = Bulineln (Siceucic_Clir=os.cieicexsel () ); 
SSS COiutlG).siveec Gliic 

'/Users/firecat' 


For more info about Bunch, see this Web page, part of the Python Cookbook: 


http://www.aspn.ActiveState.com/ASPN/Cookbook/Python/Recipe/52308 


Getting Close Enough with difflib 


People often make mistakes when entering text. The difflib module gives you a way to 
find strings that are close but not exact matches to a given string: 


SSS almooueie Glabit i a6 
>>> right = 'The quick brown fox' 
>>> wrong = 'THe quack brown fix' 


>>> matcher = difflib.SequenceMatcher (None, right, wrong) 
>>> print matcher.ratio() 
0.842105263158 


The ratio() method returns a floating point number between 0 and | that indicates how 
close the match is. Higher numbers indicate a closer match. A match higher than 0.6 is 
usually considered "good" (maybe not by medieval manuscript-illuminating monks, but 
good enough for the modern world). 
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To compare a single word against a list of words, use the difflib module's 
get_close _matches() method. It finds the words with the highest match ratio: 


>>> import difflib 


SS> mvyllist = [vemiek”, Yoquaek"’, “emer”, “emeiric’ | 
SSS Gir luo cei Close _ meres ("emoee”,, myLusic) 
| Yoraizerc I] 


DSU! DSU! Rah rah DSU! 


DSU stands for "decorate, sort, undecorate," and it's a clever trick for sorting sequence 
objects by one of their elements. 


TECHNICAL This use of the word decorate doesn't have anything to do with the 
STUFF decorator feature discussed in Chapter 16. Python has just gotten so 
big that it's having to recycle the words it uses to describe tools! 


Suppose you have a list of address tuples, like this: 


addresses = [ 
(1123 Mean Sie’, VAyeermimn’, 9 iLesae") 
(YI2ZS Secomel Sit’, 'Orlmexirows!, “S321 ) 
(T4356 Sacoinel Sie’, VNewiaercewalilike!, "il iailil)) , 
('777 Morris St', 'Filenes Basement', '99999") 
] 


Now suppose you want to sort them by zip code. The sort () method of lists lets you 
pass a comparison function, but we don't recommend that because it's clumsy and slow. A 
better way is to create a new list that is decorated with the zip field, so that each tuple 
becomes a 2-tuple where the first element is the zip code and the second element is the 
old tuple: 


(VR23E5) 2 (VIS Mesa Sicl, Vaiyncowa’, YI2345"7) ) 


After you have the list of 2-tuples, you can sort the list and then wndecorate it (remove 
the extra zip field). 


These two functions do DSU on such a list. The first one uses a list comprehension to 
create the tmp list and the return value. (See Chapter 16 for more on list 
comprehensions.) 


Cet Es _zijs (aciheesses) s 


tmp = [(x[2], x) for x in addresses] 
tmp.sort () 
return [x[1] for x in tmp] 


Here's what happens when you run the function: 
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SSS ESul_Zaijs (aclclesses)) 

[ (7456 Secoine Sic’, VNOwnerenALlile!, Vill), (7123 Mead Sie’, 
Viowcowa! , VILASaS') , (VL23) Seeouc Sie, VOremeccown, Vs szd')) , 
('777 Morris St', 'Filenes Basement', '99999") ] 


Tip There's another way to perform a DSU sort. Python 2.4 adds a new key parameter to 
the sort () method and the sorted() built-in function. Use this with the 
itemgetter() class from the operator module for a simpler DSU-type solution: 


>>> import operator 

SS> ielmalieel UicEM = GPSiesieOie, LieSmMg Sic Sie (2) get 3rd item from tuple 
SSS achlceSesSes . SORE (key—iclniied! dice) 
>>> addresses 
[C7456 Secomel Se', VNowlercevyiileY, Yt), (7 il23 Meakin Sie, 
UNoycoria’ , VIZ iy, ("L223 Se@omcl Sic', Orascom! , YSAS3zZi1)), 
('777 Morris St', 'Filenes Basement', '99999') ] 


TECHNICAL The key parameter requires a function, which itemgetter () 
STUFF supplies. A similar att rgetter() function works with class 
instances to retrieve attribute values based on their names. 


Simplifying Choices Using Dicts 


Typing lots of if/elif/else statements can get rather tedious, as the following example 
shows. In the example code, account_type is a string and CheckingAccount, 
SavingsAccount, and GenericAccount are Classes or functions (for this purpose it 
doesn't matter which): 


1 ACCCUINE yee == "Claeclziimg)” g 
new_account = CheckingAccount () 
elit aceoume iyoe == " Senvaumgsi" 5 
new_account = SavingsAccount () 
else: 
new_account = GenericAccount () 


If you're checking a single expression against a bunch of constants, as in the preceding 
code, using a dict to store the constants makes more sense: 


accounts = { 
"checking': CheckingAccount, 
"savings': SavingsAccount, 
} 
new_account = accounts.get(account_type, GenericAccount) () 


The preceding line of code contains a few useful techniques: 
e The get () method of dicts lets you specify a value to return if the key isn't found. 


Here, if the account type isn't in the accounts dict, the get () method returns 
GenericAccount. 
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e CheckingAccount and SavingsAccount don't have parentheses after them inside 
the dict because they're being stored rather than called. 

e The set of parentheses at the end of the line after the get () call specify that the 
function or class that get () returns should be called. 


Singles Going Steady 


Sometimes you want to make sure that there is only one of an object (called a singleton) 
within your program. A common example might be an object that holds configuration 
info. There are two standard ways to do this in Python: creating a singleton object and 
creating a Borg Pattern. 


Using a singleton object 


One way to make sure you have a unique object is to create a type of object that's a 
always a singleton. Python has two: modules and classes. (Classes are singletons, but 
instances aren't.) The following example creates a class called config and stores an 
attribute in it. You can't create another Config class in your module, so you can be sure 
that Config.start dir is unique. 


class Config: 
pass 


GOmir ike) Siveucic Clic = OS .GeieCwel () 
Calling in the Borg Pattern 


You can make a special kind of class so that all of its instances share the same state 
(which makes them functionally identical). That means whatever you do to one instance 
is also done to all the other instances. 


Tip To create this class, use the following brilliant code from Alex Martelli called the 
Borg Pattern: 


Class Comite s 
_ Sihaxecl Scans = {} 
Git  almadic == (S@1i)) s 
selit, Cict = S6li, Saree Siece 


Here's what happens when you test it. This code shows that the two Borg instances are 
different objects (they are stored in different locations): 


>>> borgl = Config() 
>>> borg2 = Config() 


SSS leeuecil 
<__main__.config instance at 0x584e0> 
SSS leeueejZ 
<__main__.config instance at 0x584b8> 
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This code shows that both instances share state—when an attribute is assigned to borg2, 
the same attribute now exists in borgl. 


SoS lOWGZ Sie Clic = OS .6Siecuwel () 
SSS jeiewline loowe2.Siveicie _Clilic 
/Users/firecat/StefPythonProgs 
SSS joweiiatc lowell .siceueic_clalae 


/Users/firecat/StefPythonProgs 


For more about the Borg Pattern, see 


http://www.aspn.ActiveState.com/ASPN/Cookbook/Python/Recipe/6653 1 


Chapter 22: Ten Great Resources 


This chapter lists ten resources to check out to find out more about Python, to get more 
help with your programs, and to get involved with the Python community. 


The Mothership: http://www.python.org 


The Python mothership on the Web is docked at http://www.python.org. If you have 
questions about Python, you should go here first. It includes the latest versions of Python 
for download, along with documentation, FAQs, introductions, tutorials, a Wiki (an 
encyclopedia-type structure that anyone can edit), mailing list archives, announcements, 
links to third-party modules, and more. 


A search box is available on most of the main pages of http://www.python.org. If you 
want to search specific parts of the site or specify lots of search criteria, use the search 
page, available here: 

http://www.python.org/search/ 

The following sections highlight important parts of the http://www.python.org Web site. 


We're glad you asked that: Python FAQs 


Python FAQs (Frequently Asked Questions) and a Spanish translation can be found at 
http://www.python.org/doc/faq/. The FAQs cover such topics as 


e What Python is, how to get a copy, who uses it and for what, and Python's design 
philosophy 

e¢ How-tos for common programming tasks 

e Installing and using third-party libraries 
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e Writing extensions to Python in other languages 
e Python's behavior in Windows 
¢ Graphical User Interface (GUI) programming 


Official documentation 


You can always find the documentation for the latest version of Python at 
http://www.python.org/doc/current/. Documentation for earlier versions of Python is also 
available. You can view the docs on the Web site or download them in a variety of 
formats. 


In addition to the docs for specific versions, you can find general guides covering a 
variety of subjects: 


e Beginner's guides (for people who haven't programmed before) 

e Introductions to Python (some for people who have used other languages and 
some for beginners; tutorials are also included) 

e A PythonInfo Wiki—a collection of pages that anyone can edit 

e Topic guides—articles about databases, parsers, scientific uses of Python, Web 
programming, XML, and more 

e A list of books and reviews of books written about Python, organized by topic 


TECHNICAL Python Enhancement Proposals (PEPs) are documents of the process 

STUFF of adding new features to Python. An archive of PEPs for features 
added to previous versions of Python is available. PEPs aren't 
documentation per se, but they usually include an explanation of a 
feature's benefits, which can be very helpful for understanding its 
uses. Summaries of arguments for and against the feature shed light 
on the philosophy behind the Python language. An index of PEPs is 
here: 


http://www.python.org/dev/peps/ 


If you're a history buff, you'll like PEPs because they include background information 
about how proposals developed. 


tutor@python.org and help@python.org 


The Tutor mailing list is for beginning and intermediate Python programmers to ask and 
answer each other's questions. Its Web page is here: 


http://www.mail.python.org/mailman/listinfo/tutor/ 
You can post a message to it by sending e-mail to tutor@python.org. We recommend that 


if you post a message, you also subscribe to the list! You can do so via a form on the 
above Web page. 
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Tutor mailing list archives are at http://www.mail.python.org/pipermail/tutor/. When you 
use the search box on the Python home page, you get results from the Tutor Archives as 
well as from other pages on the site. 


If you have a question that you want answered privately, and you can't find information 
about it by searching the Python Web site, you can send e-mail to helpWpython.org, 
which is staffed by volunteer Python experts. 


The comp.lang.python Newsgroup 


The comp. 1lang.python Usenet newsgroup is the main source for a variety of discussions 
about Python. Both beginners and advanced programmers post to the group, and contents 
include examination of programming problems, discussions of Python compared with 
other languages, discussions (often vociferous!) of proposed new Python features, and 
more. 


You can read comp. 1lang.python either with netnews software or as a mailing list. 
Because it receives hundreds of posts a day, we recommend reading it via Usenet; news- 
reading software generally handles complex message threading better than e-mail 
software. 


Cheese Shop: Online Collection of Python Modules 


Cheese Shop, formerly called PyPI, maintains a collection of third-party Python modules. 
It lives on the http://www.python.org Web site. Its home page is here: 


http://www.cheeseshop.python.org/ 


As of this writing, it contains 1,429 packages. Most of them are available for free (but 
check the licensing scheme before you distribute them or use them in commercial 
applications). You can search packages by name or by keyword, and you can browse a list 
of packages organized by topic and by type (for example, you can browse for packages 
that work on a particular operating system). You can also submit your own packages. 


TECHNICAL Cheese Shop is named after the Monty Python sketch of the same 


STUFF name, in which a customer discovers that the cheese shop doesn't, in 
fact, have any cheese for sale at all. 
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Random Access Reference at 
http://www. wiki.python.org 


The Web page http://www.wiki.python.org/ contains links to two Wikis: the Python Wiki 
(for Python) and the Jython Wik:. (Jython is Python for Java.) 


A Wiki is a collection of Web pages that anyone can edit. Wikis usually make heavy use 
of hyperlinks, and you gather information by jumping from one page to another as you 
find linked topics that interest you. 


The front page for the Python Wiki contains a list of starting points for jumping in. This 
includes a Code page that lists bits of useful code that are too small to count as modules. 
One of the items on the Code page extracts a useful bit of information about string 
formatting that's buried in the official documentation for sequence types: 


http://www.wiki.python.org/moin/StringFormatting/ 


Tip The Python Wiki is a work in progress and doesn't include all the information 
available on other pages of the http://www.python.org Web site. But it does contain 
some information that isn't available on other parts of the site. It can be a good place 
to explore if you like to learn by ingesting small chunks of information. As you 
become more comfortable with Python, you might want to contribute your own 
pages. 


The Python Cookbook, part of the ActiveState Programmer Network Web site, collects a 
variety of recipes, or short bits of code or programs that perform particular tasks and can 
be freely used in your own programs. The home page is here: 


http://www.aspn.activestate.com/ASPN/Cookbook/Python/ 
Editors review submissions to the Cookbook, and you have the option of searching the 
editors' favorites, approved, and not-approved recipes. The recipes are periodically 


collected into a dead-trees book. 


Each recipe is displayed on its own page with a description, source code, and discussion. 
Members of ASPN can add comments to the page 
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The Latest News 


The Python newsgroup comp. 1lang.python is an excellent resource for finding out about 
Python and joining in the community. But if you don't have time to read 200+ posts every 
day, you can find out what's going on in the Python community in the following ways. 


Dr. Dobbs’ Python-URL 


Dr. Dobbs' Python-URL is a weekly summary of hot topics in comp.lang. python and 
other doings. It's posted weekly to comp. 1lang.python and 
comp.lang.python. announce. 


You can get past issues from Google Groups, an archive of Usenet newsgroup postings. 
Go to http://www.groups.google.com/groups/ and type dr dobbs python-url1! in the 
Search box. 


Daily Python URL 

Daily Python URL 1s a blog (Web log) from http://www.pythonware.com that includes 
links to news about Python, additions to the Cheese Shop module library, and more. Read 
it here: 

http://www.pythonware.com/daily/ 

comp.lang.python.announce 

The comp. lang.python. announce newsgroup is a moderated forum for Python-related 
announcements. Availability of new modules and programs and requests for comments on 


Python Enhancement Proposals (PEPs) are posted here. This newsgroup receives only a 
few messages a day, so it's easier to keep up with than comp.lang.python. 


Being a PUG-nosed PIGgie: Local User Groups 


Python user groups gather in what computer nerds call "meatspace" in cities throughout 
the world. Such groups welcome beginners and provide demonstrations, lectures, and 
time for questions and socializing. (Sometimes you get free food, too!) The groups often 
have the acronym PIG ("Python Interest Group") or PUG ("Python User Group") in their 
names. 


A list of local user groups can be found on the Python Wiki here: 


http://www.wiki.python.org/moin/LocalUserGroups/ 
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The Python Wiki might not be up-to-date, so if you don't see a group in your area, be sure 
to search the Web and/or post to comp. 1lang.python to see if others know of a group 
there. 


Many local user groups maintain mailing lists in which you can read announcements 
about local meetings and participate in discussions about Python. Some of the groups 
have archives of past messages that are available to non-subscribers so you can check out 
whether the group meets your needs. 
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Part VI: Appendixes 


Chapter List 


Appendix A: Getting and Installing Python 
Appendix B: Python Version Differences 


The 5" Wave By Rich Tennant 


In this part ... 


If you weren't lucky enough to have Python already installed on your computer, 
Appendix A shows you how to download and set up Python for your operating system. 


Appendix B describes the new features in each new version of Python starting with 
Python 2.0. 


Appendix A: Getting and Installing 
Python 
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This Appendix contains instructions for installing Python and its modules, the Python 
development environment (IDLE), and Python documentation on the three major 
operating systems: Windows, Mac OS, and UNIX. This chapter also provides tips for 
using Python when it is embedded in your application. 


Tip All versions of Python are available for download here: 


http://www.python.org/download 


Operating Systems 


To install Python on your Windows, Mac OS, or UNIX computer, follow the instructions 
in the appropriate subsection below. 


Windows 


You can find out whether Python is already installed on your Windows system. If you 
haven't already installed Python, or if you have an older version of it, we show you how 
to install the latest version. 


Checking for Python 
There are three ways to find out whether Python is installed on your Windows system: 


e Look for Python in the Start menu. 
e Use the Search command to search for python. exe. 
e Type python in the Command Prompt window. 


Using Embedded Python 


If you have an application that uses Python as its scripting language, you're in luck! 
Python makes it easy to customize your application. 


You'll be able to use the full power of the Python language as described in this book, but 
there are some issues to be aware of: 


e The embedded Python might not include all the standard Python modules. 
You won't be able to use the modules that aren't included in scripts for your 
application. 

e You won't be able to upgrade the embedded version of Python. You'll have to 
wait for an upgrade to the application itself. 

Warning You can still install a newer version of Python elsewhere on your 
system. But be sure not to overwrite the version that the application 
uses. 
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e You may want to test your scripts outside the application. Depending on how 
the application is set up, a mistake in your script could cause the application to 
crash or hang. 


Using Embedded Python 


If you have an application that uses Python as its scripting language, you're in luck! 
Python makes it easy to customize your application. 


You'll be able to use the full power of the Python language as described in this book, but 
there are some issues to be aware of: 


e The embedded Python might not include all the standard Python modules. 
You won't be able to use the modules that aren't included in scripts for your 
application. 

e You won't be able to upgrade the embedded version of Python. You'll have to 
wait for an upgrade to the application itself. 

Warning You can still install a newer version of Python elsewhere on your 
system. But be sure not to overwrite the version that the application 
uses. 

e You may want to test your scripts outside the application. Depending on how 
the application is set up, a mistake in your script could cause the application to 
crash or hang. 
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